1 / 62

“Good Enough” Database Caching

“Good Enough” Database Caching. Hongfei Guo University of Wisconsin-Madison. Motivation — Scaling Google. …. Motivation — Scaling A DBMS By Caching. How to tell whether the cached data is “good enough” for an application? NO data quality requirements from the applications!

sonya-james
Download Presentation

“Good Enough” Database Caching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Good Enough” Database Caching Hongfei Guo University of Wisconsin-Madison

  2. Motivation — Scaling Google

  3. Motivation — Scaling A DBMS By Caching How to tell whether the cached data is “good enough” for an application? • NO data quality requirements from the applications! • NO data quality guarantees from the caching DBMS! Application Server App specific code … Caching DBMS Updates Updates Asynchronous Backend DBMS

  4. Application Server Caching DBMS Backend DBMS The Thesis • Apps: Specifiesdata quality requirements in queries • Cache: Enforces data quality constraint [SIGMOD 2004] [SIGMOD 2004 Demo] • Cache admin: Specify local data quality to be maintained by cache (Data quality-centric database caching model) [TR 2005] [submitted for publication] • Data quality-aware adaptive cache management [ongoing work]

  5. Data Quality Metrics (informal) • Currency:The elapsed time since this copy becomes stale • Consistency: A query result is (snapshot) consistent iff it is as if evaluated from a snapshot of the master database • C&C: Currency & Consistency

  6. Roadmap • Background • Specifying data quality constraints in SQL • Data quality-centric caching model • Enforcing data quality constraints • Other research • Future directions

  7. Specifying Data Quality Constraints in SQL[Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004] • Currency requirements • Consistency requirements • Extend SQL to specify relaxed C&C requirements • Formal semantics of C&C constraints

  8. Currency Requirements Example 1: The caching database keeps BookCopy • Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred) • Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred)

  9. Currency Requirements Example 1: The caching database keeps BookCopy • Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred) • Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred)

  10. Currency Requirements Example 1: The caching database keeps BookCopy • Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred) • Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred) Different apps may have different currency requirements for the same query

  11. BookCopy bid title author 1 databases Raghu 2 databases Ullman ReviewCopy rid bid text 1 1 … 2 1 … 3 2 … Consistency Requirements Example 2: SELECT * FROM Books B, Reviews R WHERE B.bid = R.bid AND B.title = “Databases” Different apps may have different consistency requirements for the same query The whole query result be consistent Books be consistent & Reviews be consistent Each book be consistent with its reviews

  12. BookCopy bid title author 1 databases Raghu 2 databases Ullman ReviewCopy rid bid text 1 1 … 2 1 … 3 2 … Proposed SQL Syntax SELECT * FROM Books B, Reviews R WHERE B.bid = R.bid AND B.title = “Databases“ Consistency class Currency bound Group by CURRENCY BOUND 10 min ON (B, R) BY B.bid CURRENCY BOUND 10 min ON (B, R) CURRENCY BOUND 10 min ON (B), 30 min ON (R)

  13. Specifying Data quality Constraints in SQL: Contributions • Extend SQL to express C&C constraints • Single-block queries • Multi-block (i.e., nested) queries • Timeline constraint • Formal semantics of C&C constraints Provides correctness standard for using replicated or cached data

  14. Roadmap • Background • Specifying data quality constraints in SQL • Data quality-centric caching model • Enforcing data quality constraints • Other Research • Future directions

  15. Data Quality-Centric Caching Model[Guo, Larson and Ramakrishnan, submitted] • Cache data quality properties • Cache property specification • Maintenance and “safety”

  16. Why Define Cache Properties? Query processing Cache Properties (= contract) Cache maintenance

  17. Cache Properties (P+3C) • Presence — per object • Consistency — a set of objects • Completeness — per predicate • Currency — object staleness

  18. View 1 View 2 View 3 Basic Concepts Tables Object H1 Snapshots Master Database Cache H2

  19. View 1 View 2 View 3 Cache Property Examples Currency = now – stale point Consistent Complete Present H1 Master Database Stale point Cache H2

  20. Specifying Cache Properties • Specified as integrity constraints • Presence constraint • Consistency constraint • Completeness constraint • Presence correlation constraint • Consistency correlation constraint

  21. Presence Constraint AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle Backend DBMS AuthorList_PCT: 1 2 3 Caching DBMS

  22. Presence Constraint Partially materialized view [Zhou et al 2005] AuthorCopy: CREATE VIEW AuthorCopy AS SELECT * FROM Authors CREATE TABLE AuthorList_PCT (authorId int) ALTER VIEW AuthorCopy ADD ON authorId IN (SELECT authorId FROM authorId_PCT 1 Alice Madison 2 Bob Madison 3 Cedric Seattle control-key control-table AuthorList_PCT: PRESENCE 1 2 3

  23. Consistency Constraint AuthorCopy: Cache Region CREATE TABLE CityList_CsCT (city string) ALTER VIEW AuthorCopy ADD ON city IN (SELECT city FROM cityList_CsCT 1 Alice Madison 2 Bob Madison 3 Cedric Seattle Backend DBMS CityList_CsCT: AuthorList_PCT: AuthorList_PCT: Consistency Madison 1 2 3 1 2 3

  24. Completeness Constraint AuthorCopy: CREATE TABLE CityList_CpCT (city string) ALTER VIEW AuthorCopy ADD ON city IN (SELECT city FROM cityList_CsCT 1 Alice Madison 2 Bob Madison 3 Cedric Seattle Backend DBMS AuthorList_PCT: CityList_CpCT: AuthorList_PCT: Completeness Madison 1 3 1 3

  25. Presence Correlation Constraint AuthorList_PCT: AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle authorId 1 2 3 authorId BookCopy: Backend DBMS 111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3 eee ALTER VIEW BookCopy ADD PRESENCE ON authorId IN (SELECT authorId FROM AuthorCopy)

  26. Presence Correlation Constraint AuthorList_PCT: AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle authorId 1 2 3 authorId BookCopy: AuthorList_PCT 111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3 eee authorId AuthorCopy authorId BookCopy

  27. Consistency Correlation Constraint AuthorList_PCT: AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle authorId 1 2 3 authorId BookCopy: Backend DBMS 111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3 eee ALTER VIEW BookCopy ADD CONSISTENCY ROOT

  28. Consistency Correlation Constraint AuthorList_PCT: AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle authorId 1 2 3 authorId BookCopy: AuthorList_PCT 111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3 eee authorId AuthorCopy authorId BookCopy

  29. Cache Schema Example AuthorList_PCT ReviewerList_PCT authorId reviewerId AuthorCopy ReviewerCopy authorId reviewId ReviewCopy BookCopy isbn

  30. Pull-Maintenance • Refresh a region by pulling query results • When refreshing a region, also refresh the affected closure • All overlapping regions • All correlated regions

  31. Pull-Maintenance BookCopy: AuthorList_PCT: 111 1 aaa 222 1 bbb 333 1 ccc 444 3 aaa 555 4 eee 1 3 4 authorId TitleList_CsCT: aaa

  32. Pull-Maintenance AuthorCopy: AuthorList_PCT 1 Alice Madison 3 Cedric Seattle authorId AuthorCopy authorId BookCopy: authorId BookCopy 111 1 aaa 222 1 bbb 333 1 ccc 444 3 aaa 555 3 eee

  33. Inefficient Pulling AuthorCopy: Shared-row problem 1 Alice Madison 3 Cedric Seattle AuthorBookCopy: authorId BookCopy: 1 111 1 222 1 333 3 111 3 555 111 10 aaa 222 20 bbb 333 30 ccc 555 50 eee isbn

  34. Issues • Inefficient pulling: • Calculation of the affected closure requires checking the rows • Efficient pulling: • The affected closure does NOT depend on the instance of a view • Only requires forward pull among correlated views

  35. Definition: (Safe PMV) A partially materialized view V is safe if the following two conditions hold for every instance of the cache that satisfies all integrity constraints: For any pair of regions in V, either they don’t overlap or one is contained in the other. If V is gray, let X denote the set of regions in V defined by presence control-key values. X is a partitioning of V and no pair of regions in X is contained in any one region defined on V. Theoretical Results • Cache schema design rules: • Rule 1: A cache graph is a DAG. • Rule 2: Only red nodes can have independent completeness or consistency control-tables. • Rule 3: Every PMV with more than one parent must be a red circle. • Rule 4: If a PMV has the shared-row problem according to Lemma 5.2, then it cannot be gray. • Rule 5: A PMV cannot have non-compatible control-tables. Syntactically checkable conditions (polynomial) Property for every instance • Theorem: Given a cache schema <W, E>, if it satisfies the design rules, then every PMV in W is safe. Conversely, if the schema violates one of these rules, there is an instance of the cache satisfying all specified integrity constraints in which some PMV is unsafe.

  36. Data Quality-Centric Caching Model: Contributions • Four cache properties • Specifying cache properties • Cache property unit: cache region • Safe views and efficient pulling Provides an abstraction layer (contract) between query processing and cache maintenance

  37. Roadmap • Background • Specifying data quality constraints in SQL • Data quality-centric caching model • Enforcing data quality constraints • Other research • Future directions

  38. Enforcing Data Quality Constraints • Overview • Simple case: View-level consistency [Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004] [Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004 Demo] Implemented in MS SQL Server code base • General case: Row-level consistency [Guo, Larson and Ramakrishnan, submitted]

  39. Cache Region Metadata Shadow Databases Heartbeat Tables Backend DBMS Queries with Relaxed C&C Requirements Queries Query Optimizer Execution Engine Local Materialized Views Caching DBMS Results Results Extension to MTCache Framework MTCache Framework [Larson et al. 2004]

  40. Simple Case Assumptions • Fully materialized views • Each view is consistent • Push-based maintenance • E.g., MS replication service

  41. Cache Region Metadata Shadow Databases Heartbeat Tables Backend DBMS Queries with Relaxed C&C Requirements Query Optimizer Execution Engine Local Materialized Views Caching DBMS Results Results Extension to MTCache Framework

  42. V 1 V 3 V2 V 4 V 5 C&C Tracking Mechanism • Consistency tracking  cache region (CR) • The unit of update propagation • Data mutually consistent all the time • Properties, e.g., est. delay, est. interval • Currency tracking  heartbeat table V 1 V 3 V2 Backend Cache V 4 V 5 CR1: 12: 10 12: 00 12: 30 12: 30 12: 20 12: 00 CR2:

  43. Shadow Databases Backend DBMS Queries with Relaxed C&C Requirements Queries with Relaxed C&C Requirements Currency Region Metadata Query Optimizer Heartbeat Tables Execution Engine Local Materialized Views The best plan that: • Satisfies consistency requirements • Includes run-time currency checking Caching DBMS Results Results Extension to MTCache Framework

  44. Extension to the Optimizer • Compile-time consistency checking • Run-time currency checking • Cost estimation

  45. Consistency Checking • Enforced at optimization time • Immediately prune a sub-plan if it violates consistency constraints Merge join Q1: σ( Books Reviews) CURRENCY 5 ON (Books, Reviews) Local scanReviews Remote queryon Books

  46. Run-time Currency Checking When view V matches expression E E V Currency Guard SwitchUnion Local plan using V Remote plan requesting E • Currency guard: • Check if local view V satisfies currency requirement

  47. Cost Estimation Cost for the SwitchUnion operator: C = p* Clocal + (1- p) * Cremote + Ccg p : probability that the local branch will be used Clocal :cost of execution of the local branch Cremote : cost of execution of the remote branch Ccg : cost of currency checking

  48. Estimating p Compute p from three parameters: f : estimated refresh interval d : estimated minimal delay B : currency bound 0 if B-d ≤ 0, (B-d)/f if 0 < B-d ≤ f, 1 if B-d > f p =

  49. Changing The Assumptions Fully materialized views Consistent views Push-based maintenance Partially materialized views Row-level consistency Pull-based maintenance • More general algorithms • Run-time check for consistency constraints that can not be validated at compile-time

  50. Run-time C&C Checking When view V matches expression E E Currency Guard SwitchUnion Local plan using V Remote plan requesting E • Currency guard: • Check if local view V satisfies currency requirement

More Related