1 / 37

Uncertainty Lineage Data Bases

Uncertainty Lineage Data Bases. Very Large Data Bases. 1975. 2006. UNCERTAINTY. LINEAGE. DATA. ULDBs: Databases with Uncertainty and Lineage. Omar Benjelloun, Anish Das Sarma , Alon Halevy, Jennifer Widom Stanford InfoLab. Mot iv ation.

kueng
Download Presentation

Uncertainty Lineage Data Bases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Uncertainty Lineage Data Bases Very Large Data Bases 1975 2006

  2. UNCERTAINTY LINEAGE DATA ULDBs: Databases with Uncertainty and Lineage Omar Benjelloun, Anish Das Sarma, Alon Halevy, Jennifer Widom Stanford InfoLab

  3. Motivation • Many applications involve data that is uncertain (approximate, probabilistic, inexact, incomplete, • imprecise, fuzzy, inaccurate, ...) • Many of the same applications need to track the lineageof their data • Neither uncertainty nor lineage are supported by conventional DBMSs Coincidence or Fate?

  4. Sample Applications Needing Uncertainty and Lineage • Scientific databases • Sensor databases • Data cleaning • Data integration • Information extraction

  5. Trio Project Building a new kind of DBMS in which: • Data • Uncertainty • Lineage are all first-class interrelated concepts

  6. Coincidence or Fate? Lineage and Uncertainty • Lots of independent work in lineage and uncertainty (related work at end of talk) • Turns out: The connection between uncertainty and lineage goes deeper than just a shared need by several applications

  7. Lineage and Uncertainty • Lineage... • Enables simple and consistent representation of uncertain data • Correlates uncertainty in query results with uncertainty in the input data • Can make computation over uncertain data more efficient

  8. Outline of the Talk • The ULDB data model • Querying ULDBs • ULDB properties • Membership and extraction operations • Confidences • Current, related, and future work

  9. Running Example: Crime Solver Saw(witness,car) Drives(person,car) Suspects(person) = πperson(Saw ⋈ Drives)

  10. Uncertainty • Anuncertain database represents a set of possible instances. Examples: • Amy saw either a Honda or a Toyota • Jimmy drives a Toyota, a Mazda, or both • Betty saw an Acura with confidence 0.5 or a Toyota with confidence 0.3 • Hank is a suspect with confidence 0.7

  11. Uncertainty in a ULDB 1. Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences

  12. Three possible instances Uncertainty in a ULDB 1.Alternatives:uncertainty about value 2. ‘?’ (Maybe) Annotations 3. Confidences =

  13. Uncertainty in a ULDB 1. Alternatives 2. ‘?’ (Maybe):uncertainty about presence 3. Confidences ? Six possible instances

  14. Uncertainty in a ULDB 1. Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences:weighted uncertainty ? Six possible instances, each with a probability

  15. Data Models for Uncertainty • Our model (so far) is not especially new • We spent some time exploring the space of models for uncertainty [ICDE 2006] • Tension between understandability and expressiveness • Our model is understandable • But it is not complete, or even closed under common operations

  16. Closure and Completeness • Completeness Can represent all sets of possible instances • Closure Can represent results of operations • Note: Completeness Closure

  17. Model (so far) Not Closed Suspects= πperson(Saw ⋈ Drives) CANNOT Does not correctly capture possible instances in the result ? ? ?

  18. Lineage to the Rescue • Lineage: “where data came from” • Internal lineage • External lineage (not covered in this talk) • In ULDBs: A functionλ from alternatives to sets of alternatives (or external sources)

  19. Correctly captures possible instances in the result Example with Lineage Suspects= πperson(Saw ⋈ Drives) λ(31) = (11,2),(21,2) ? λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2) ? λ(33) = (11,1), 23 ?

  20. ULDBs • Alternatives • ‘?’ (Maybe) Annotations • Confidences • Lineage ULDBs are Closed and Complete

  21. Outline of the Talk • The ULDB data model • Querying ULDBs • ULDB properties • Membership and extraction operations • Confidences • Current, related, and future work

  22. Querying ULDBs • Query Qon ULDB D implementation of Q D D’ D + Result possible instances representation of instances Qon each instance D1, D2, …, Dn Q(D1), Q(D2), …, Q(Dn)

  23. Well-Behaved ULDBs • If we start with a well-behaved ULDB and perform standard queries, it remains well-behaved • Intuitively (details in paper): • Acyclic:No cycles in the lineage • Deterministic:Non-empty lineages of distinct alternatives are distinct • Uniform: Alternatives of same tuple are derived from the same set of tuples

  24. ULDB Minimality • Data-minimality • Does every alternative appear in some possible instance? (no extraneous alternatives) • Does every maybe-tuple in Rnot appear in some possible instance? (no extraneous ‘?’s) • Lineage-minimality

  25. Data-Minimality Examples Extraneous ‘?’ λ(20,1)=(10,1); λ(20,2)=(10,2) ? extraneous

  26. Data-Minimality Examples Extraneous alternative ? extraneous ? ?

  27. Data-Minimization • Extraneous alternative theorem: • An alternative is extraneous iff it is (possibly transitively) derived from multiple alternatives of the same tuple. • Extraneous “?” theorem • A “?” on tuple t is extraneous iff • it is derived from base tuples without “?” • t has as many alternatives as the product of the number in its base tuples • Minimization algorithm based on the theorems (see paper)

  28. ULDB Properties and Operations Data-minimize Lineage-minimal Queries Data-minimal Lineage-minimal Data-minimal Extraction Membership Lineage-minimize

  29. R possible instances I1, I2, …, In Membership Questions • Does a given tuple t appear in some (all) possible instance(s) of R? • Polynomial algorithms based on Data-minimization • Is a given table T one of (all of) the possible instances of R? • NP-Hard t? , T?

  30. Extraction Drives Saw • Extraction algorithm in paper Eats Suspects

  31. Outline of the Talk • The ULDB data model • Querying ULDBs • ULDB properties • Membership and extraction operations • Confidences • Current, related, and future work

  32. Confidences • Confidences supplied with base data • Trio computes confidences on query results • Default probabilistic interpretation • Can choose to plug in different arithmetic ? ? Probabilistic Min 0.3 0.4 ? 0.6

  33. Query Processing with Confidences • Previous approach (probabilistic databases) • Each operator computes confidences during query execution • Only certain query plans allowed • In ULDBs • Confidence of alternative A is function of confidences in its transitive lineage • Our approach: Decouple data and confidence computation • Use any query plan for data computation • Compute confidences on-demand using lineage • Can give arbitrarily large improvements

  34. Current Work: Algorithms • Algorithms: confidence computation, extraneous data, membership questions • Minimize lineage traversal • Memoization • Batch computations

  35. The Trio Trio • Data Model • ULDBs (Coming: incomplete relations; continuous uncertainty; correlation uncertainty) • Query Language • Simple extension to SQL • Query uncertainty, confidences, and lineage • System • Did you see our demo?  • Version 1: Entirely on top of conventional DBMS • Surprisingly easy and complete, reasonably efficient TriQL

  36. Brief Related Work • Uncertainty • Modeling • C-tables [IL84], Probabilistic Databases [CP87], using Nested Relations [F90] • Systems • ProbView [LLRS97], MYSTIQ [BDM+05], ORION [CSP05], Trio [BDHW05] • Lineage • DBNotes [CTV05], Data Warehouses [CW03]

  37. UNCERTAINTY LINEAGE DATA but don’t forget the lineage… Thank You Search “stanford trio” (or, http://i.stanford.edu/trio)

More Related