1 / 27

LIVE A lineage-supported, versioned DBMS

LIVE A lineage-supported, versioned DBMS. Anish Das Sarma Martin Theobald Jennifer Widom. Agenda. ULDB Data Model and the Trio System Uncertainty & Lineage LIVE Data Model (LDM) Uncertainty, Lineage & Versioning Data Modifications Insert/Delete Tuples, Update Values, Update Confidences

mea
Download Presentation

LIVE A lineage-supported, versioned DBMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIVE A lineage-supported, versioned DBMS Anish Das Sarma Martin Theobald Jennifer Widom

  2. Agenda • ULDB Data Model and the Trio System • Uncertainty & Lineage • LIVE Data Model (LDM) • Uncertainty, Lineage & Versioning • Data Modifications • Insert/Delete Tuples, Update Values, Update Confidences • Query Evaluation • Valid-At vs. Snapshot Queries, Interval Computations, Confidence Computations, Complexity • Experiments/Conclusions LIVE - A lineage-supported, versioned DBMS

  3. ULDB Data Model • Different types of uncertainty: • 1. Tuple Alternatives • 2. ‘?’ (Maybe) Annotations • 3. Confidences • Implementation of the ULDB data model: • Trio System • TriQL query language • TrioExplorer browser frontend, trioplus client, API • Enhanced PostgreSQL backend (SPI) • Search for “Stanford Trio” LIVE - A lineage-supported, versioned DBMS

  4. ULDBs – Alternatives • 1. Alternatives:uncertainty about attribute values • 2. ‘?’ (Maybe) Annotations • 3. Confidences Three possible worlds LIVE - A lineage-supported, versioned DBMS

  5. ULDBs – Maybe Annotations • 1. Alternatives • 2.‘?’ (Maybe): uncertainty about tuple presence • 3. Confidences ? Six possible worlds LIVE - A lineage-supported, versioned DBMS

  6. ULDBs – Confidences • 1. Alternatives • 2. ‘?’ (Maybe) Annotations • 3. Confidences: weighted uncertainty ? Six possible worlds, each with a probability LIVE - A lineage-supported, versioned DBMS

  7. ULDBs – Closure Suspects= πperson(Saw ⋈ Drives) CANNOT Does not correctly capture possible worlds in the result! ? ? ? LIVE - A lineage-supported, versioned DBMS

  8. ULDBs – Lineage Suspects= πperson(Saw ⋈ Drives) ? ? ? λ(31) = (11,2)(21,2) λ(32,1) = (11,1)(22,1) ; λ(32,2) = (11,1)(22,2) λ(33) = (11,1)23 LIVE - A lineage-supported, versioned DBMS

  9. ULDBs – Summary Uncertainty-Lineage Databases (ULDBs) • Alternatives • ‘?’ (Maybe) Annotations • Confidences • Lineage • ULDBs are closed and complete LIVE - A lineage-supported, versioned DBMS

  10. Lineage & Confidences • Can exclusively utilize lineage in order to compute the confidence of a result tuple. • #P-complete for general Boolean formulas • Approximation algorithms: Luby-Karp, etc. Select distinct car from Saw; 0.99 λ(21) = (11  12  13) P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) LIVE - A lineage-supported, versioned DBMS

  11. Versioning (LDM Data Model) • Version intervals for tuples • Contiguous version numbers0,…, • Database has current version vD • Tuples have a validity intervals[s, e] • Valid-At Queries: • Select * from Photo valid-at 2; • Snapshot Queries: • View Photo at 2; • Possible Worlds: • LDM databases encode lists of sets of possible worlds. LIVE - A lineage-supported, versioned DBMS

  12. Data Modifications – Insert • Insert Tuple: • Insert t with version [vD+1,] • commit; Increase vD (2) (1) (2) LIVE - A lineage-supported, versioned DBMS

  13. Data Modifications – Delete • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • commit; Increase vD (3) (2) (1) (2) LIVE - A lineage-supported, versioned DBMS

  14. Data Modifications – Update • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • commit; Increase vD (4) (3) (2) (1) (2) (4) LIVE - A lineage-supported, versioned DBMS

  15. Data Modifications – Update • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • Update Probability: • Set end(t) to vD • Insert t’=t with probability p’ and version [vD+1,] • commit; Increase vD (4) (3) (2) (1) (2) (4) (5) LIVE - A lineage-supported, versioned DBMS

  16. Data Modifications – Summary • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • Update Probability: • Set end(t) to vD • Insert t’=t with probability p’ and version [vD+1,] • Possible worlds: • Updates may create duplicate worlds, which are merged (at any version v). (4) (3) (2) (1) (2) (4) (5) LIVE - A lineage-supported, versioned DBMS

  17. Query Evaluation implementation of Q • 1) Data Computation (regular SQL, including lineage) • 2) Interval Computation (stored procedure) D D + Result operational semantics possible worlds at versions encoding of possible worlds Qon each world D1, D2, …, Dn1 Q(D1), Q(D2), …, Q(Dn) @ (0) @ (0) … D1, D2, …, Dn2 @ (1) … D1, D2, …, Dnv @ (vD) LIVE - A lineage-supported, versioned DBMS

  18. Lineage, Confidences & Versions • Can exclusively utilize lineage in order to compute the confidence of any result tuple. • Can exclusively utilize lineage in order to compute the version interval of any result tuple. LIVE - A lineage-supported, versioned DBMS

  19. Version Interval Computation • Positive Lineage (disjunctions & conjunctions) • In the lineage formulaλ(t) • Replace every tuple t’ by its version interval • Replace every  with  and every  with  Select distinct car from Saw; [1,]: 0.99 λ(21) = (11  12  13) P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) LIVE - A lineage-supported, versioned DBMS

  20. Version & Confidence Computation • Positive Lineage (disjunctions & conjunctions) • In the lineage formulaλ(t) • Replace every tuple t’ by its version interval • Replace every  with  and every  with  Select distinct car from Saw; Select distinct car from Saw valid-at 2; [1,] : 0.98 λ(21) = (11  12) P(21) = 1 – (1-0.8) X (1-0.9) LIVE - A lineage-supported, versioned DBMS

  21. Interval Computations & Query Plans • Can decouple interval computation from data computation • Or:push interval computation into query plans  only when there is no negation. Select R.A from R EXCEPT ( Select R.A from R EXCEPT Select S.A from S ); Select R.A from R,S Where R.A=S.A; t=(a)[0,10] – t=(a)[5,10]  r=(a)[0,10] u=(a)[0,10] – r=(a)[0,10] s=(a)[5,15] r=(a)[0,10] s=(a)[5,15] LIVE - A lineage-supported, versioned DBMS

  22. Complexity Results • Positive Lineage (disjunctions & conjunctions) • Version interval computation • PTIME (linear) • Confidence computation • #P-complete • Arbitrary Lineage(including negation) • Version interval computation • PTIME (linear) if all confidences are known • NP-hard if confidences are not known (need to check for idempotence of negated tuples) • Confidence computation • #P-complete LIVE - A lineage-supported, versioned DBMS

  23. Experiments – Setup • Probabilistic & versioned TPC-H setting • Queries over Lineitem, Orders tables with varying join selectivity from 0.1% to 1% (6,000-60,000 and1,500-15,000 tuples for Lineitem & Orders) • Update 0.1% to 1% of the input data • Assign probabilities within [0,1] uniform-randomly to tuples • Additional indexes for versioning • Two B+-trees on (start, end)and end points of intervals • Rewrite valid-at & snapshot queries using WHERE (start ≤ v ≤ end)predicates LIVE - A lineage-supported, versioned DBMS

  24. Experiments – Results (I) • Join query • Overhead of versioned system vs. non-versioned system (versions not computed) (%) • Join query • Overhead of computing versions (versioned system) LIVE - A lineage-supported, versioned DBMS

  25. Experiments – Results (II) • Join query • Progressive data updates (overwrite multiple times) • Join query • Valid-at queries vs. • full version computation LIVE - A lineage-supported, versioned DBMS

  26. Experiments – Results (III) • Overhead of version computation, different query types (1% data modified) LIVE - A lineage-supported, versioned DBMS

  27. Conclusions • LDMs are closed and complete • Generalizes to full ULDB data model (including value alternatives & maybe (?) annotations) • Can employ lineage also for update propagations • Supports all of INSERT/DELETE/UPDATE with INTERSECT/UNION/EXCEPT set operations Uncertainty Versioning Lineage DBMS LIVE - A lineage-supported, versioned DBMS

More Related