Managing Uncertain Information and Scaling it Up to Concurrent Transactions Alfredo Cuzzocrea*, Rubén de Juan-Marín** Hendrik Decker**,Francesc Muñoz-Escoí** *ICAR-CNR and Univ. Calabria, Cosenza, Italy **Instituto Tecnológico de Informática, Valencia
• Semantic integrity constraints for modeling uncertain data • Why traditional integrity checking isinappropriate for managing uncertain data • Inconsistency-tolerant integrity checking for managing uncertain data • Outlook and conclusion Overview
Overview, details • Uncertain Data lack Certainty, Quality,Trustworthiness, Security, Integrity … • Model,Monitor,Maintain properties by checking/repairing integrity constraints • Inconsistency/Uncertainty Tolerance • Inconsistency/Uncertainty Metrics • Metric-based Uncertainty Checking • Metric-based Repairs of Uncertainty • Conclusion and Outlook
Integrity Constraints model PropertiesSemantic integrity constraints in SQL: AssertionsEach data property can be modeled as an assertion,hence all desirable properties can be modeled that way tooSimple positive properties are usually captured by facts.Negative and complex positive properties are constraints. Negative constraints model what must not be (denials).Difference between “integrity constraint” and “desirableproperty constraint” seems to be merely terminological.
Difference between integrity con-straints and desirable constraints: • Integrity constraints must never be violated • Desirable constraints may be violated • Consequence: (Un)certainty cannot be monitored bytraditional integrity checking methodsbecause they are inconsistency-intolerant
Intolerance of Total Integrity Premise •All methods to check or repair integrity require all constraints to be 100% satisfied before/after update (checking/repairing) • Most databases are not 100% consistent • Inconsistency tolerance needed for checking/repairing uncertainty ! • Result(w/ Davide Martinenghi): Most (not all) methods are inconsistency/uncertainty-tolerant
Inconsistency-tolerant Integrity Checking D=p(a,b),p(b,c),p(c,b),p(c,c),p(c,d),p(d,b),p(d,e),... I=p(x,x) (p must be anti-reflexive) one violated instance (“case”):p(c,c)
Inconsistency-tolerant Integrity Checking D=p(a,b),p(b,c),p(c,b),p(c,c),p(c,d),p(d,b),p(d,e),... I=p(x,x) (p must be anti-reflexive) one violated instance (“case”):p(c,c) U=insert p(a,c)does not harm quality: all instances ofI satisfied in D remain satisfied in DU Thus, U is ok, although D(IC)=DU(IC)=violated.
Inconsistency-tolerant Integrity Checking D=p(a,b),p(b,c),p(c,b),p(c,c),p(c,d),p(d,b),p(d,e),... I=p(x,x) (p must be anti-reflexive) Ignore extant integrity violations:p(c,c) U=insert p(a,c)does not harm quality: all instances ofI satisfied in D remain satisfied in DU Thus, U is ok, although D(IC)=DU(IC)=violated. Reject integrity violations caused by Update: U’= insert p(a,a) violates I , thus U’ is rejected.
Metric-based Uncertainty Checking • Updatemaps old state (D,I) to new state (DU,I) • Uncertainty metric (μ, ≤), maps pairs (D, I) to lattice, partially ordered by ≤. • update Uok only if uncertainty doesn’t go up: μ(DU, I) ≤ μ(D, I)
Metric-based Repairs of Uncertainty • Let (μ, ≤) be an uncertainty metric that maps pairs (D, I) to lattice, part. ordered by ≤ . • update U is metric-bounded partial repair if uncertainty decreases: μ(DU, I) < μ(D, I)
Isolated Integrity of Concurrent Transactions • Well-known result: Isolated Integrity + Serializability Concurrent Integrity • Problems: * Concurrency may be continuous (non-stop) * Integrity/Certainty is rarely 100% satisfied * Certainty depends on foreign transactions • Solution: Uncertainty Tolerance scales up to Concurrency
Caveat • Gerenalisation ofwell-known result: Isolated Integrity of Satisfied Cases + Serializability Concurrent Integrity of Satisfied Instances • Problem: Relaxed Isolation Level (e.g. snapshot isolation), common in most DBMS, compromises serializability • Solution: We are working on it.
Conclusion and Outlook ● Model desirable properties by integrity constraints ●Most methods are uncertainty-tolerant, and uncertainty tolerance comes for free! ●Inconsistency and uncertainty are syntactic properties in datalog ●Ongoing: uncertainty-tolerant query answering ●Future work: apply uncertainty-tolerant integritychecking to explanations, schemaevolution,concurrenttransactions in replicated databases