1 / 24

Cardinality-based Inference Control in OLAP Systems An Information Theoretic Approach

Cardinality-based Inference Control in OLAP Systems An Information Theoretic Approach. Nan Zhang Texas A&M University This is a joint work with Dr. Wei Zhao and Dr. Jianer Chen. Privacy Concern. Growing Privacy Concern in Database Applications on the Internet (e.g., Data Mining)

valentina
Download Presentation

Cardinality-based Inference Control in OLAP Systems An Information Theoretic Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cardinality-based Inference Control in OLAP SystemsAn Information Theoretic Approach Nan Zhang Texas A&M University This is a joint work with Dr. Wei Zhao and Dr. Jianer Chen

  2. Privacy Concern • Growing Privacy Concern in Database Applications on the Internet (e.g., Data Mining) • 17% privacy fundamentalists, 56% pragmatic majority, 27% marginally concerned (AT&T Survey) • Challenge: Can we build accurate models of the aggregate data without access to the precise values of individual data?

  3. Answer/ reject Query Q Inference Control Problem Definition • Will the application invade privacy? Application (Data Miner) OLAP Server Randomization Data Providers DataProviders …

  4. Queries Public information Sensitive information Inference Problem

  5. Inference Problem • SU = 20 • S1+S3-SB-ST = 87

  6. Reject queries that may result in an inference problem Answer as many other queries as we can Answer/ reject Query Q Inference Control Goal Application (Data Miner) OLAP Server Database DataWarehouse

  7. Related Work • A lot of work on statistical databases • Survey • Differences • Restriction on OLAP queries • Structure of data cube • Online response time

  8. Related Work • A similar scheme • Our Advantages • Much easier approach • A tighter bound • More general framework

  9. Definition: Query 1-dimensional queries 2-dimensional queries

  10. Data Cube and Lattice of Cuboids

  11. Definition: Query • There exists a unique cuboid S such that a cell of S is the aggregation of W. • Suppose that S is a k-dimensional cuboid. The dimensionality of Q is defined to be n - k.

  12. Definition: compromisability SU = Sales amount of used books in Feb

  13. 2 1 2 5 Definition: compromisability • Compromisability • direct inference • Compromisability <= 1

  14. Cardinality-based Inference Control S3, ST: Minimum compromisability = 2, 21*(4+3)-2*22-1 = 5 > 2 +S1, SB: Minimum compromisability = 2, 21*(4+3)-2*22-1 = 5 = 5 +S1, SD: Minimum compromisability = 2, 21*(4+3)-2*22-1 = 5 > 4

  15. Our Approach • A k-dimensional query Q(F, W) can be safely answered if every k+1 dimensional dice X’ in X that • Contains W as a subset • Can be queries as a cell of a (n-k-1)-dimensional cuboid satisfies

  16. Comparison with Previous Result vs.

  17. x x x x x x x Inference H(x|AQ) = 0 Proof of Our Bound • Basic idea

  18. An Information-Theoretic Definition

  19. An Information-Theoretic Definition • Let we have Thus, no inference problem exists in a data cube X if

  20. Bounds on fmax(t0)

  21. Maximum Non-Compromisable Data Cube

  22. Main Theorem • Let we have

  23. Final Remarks • Future Work • Quantitative measure of the inference problem • Combination of randomization and inference control approaches

  24. Thank you • Questions

More Related