1 / 27

Secure XML Publishing without Information Leakage in the Presence of Data Inference

This research aims to address the challenge of publishing XML documents without leaking sensitive data, even when users can infer information using common knowledge. The study proposes methods to define sensitive data, describe common knowledge, compute inferred documents, and prevent information leakage.

tranemily
Download Presentation

Secure XML Publishing without Information Leakage in the Presence of Data Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Secure XML Publishing without Information Leakage in the Presence of Data Inference Xiaochun Yang Northeastern University, Liaoning, China Chen Li

  2. (1) (3) pname pname (2) disease (3) (1) (2) (4) ward ward ward ward (1) disease (2) pname disease (4) disease (3) (1) Alice (1) W305 (1) Cathy (3) W305 (1) (1) (1) leukemia leukemia Betty (2) W305 leukemia (3) (2) leukemia Example: Hospital XML data hospital (4) (2) (3) (1) patient patient patient patient ... pname (4) Tom W403 cancer Goal: hide Alice’s disease Common Knowledge: patients in the same ward have the same disease

  3. Problem statement: How to publish an XML document without leaking sensitive data, even if public users can do inference using common knowledge? Outline: •  Information Leakage • Defining sensitive data • Describing common knowledge • Computing inferred documents • Prevent information leakage • Experiments

  4. patient disease Alice * S A1 (1) (3) pname pname (2) disease (3) (1) (2) ward ward ward (1) disease (2) pname disease (3) (1) Alice (1) W305 (1) Cathy (3) W305 (1) (1) leukemia Betty (2) W305 leukemia (3) (2) leukemia Defining sensitive data using XQuery hospital regulating query (2) (3) (1) patient patient patient • Map the query to the XML tree • For each mapping, the target of the * node is sensitive.

  5. Common Knowledge • Represented as XML constraints • Could be obtained in various ways, e.g., • possible schema • analysis from the published data

  6. Common Constraints Patient Patient • Child constraints: //p  //p/c //patient  //patient/pname • Descendant constraints: //p  //p//d //patient  //patient//disease • Functional dependencies: //p/a//p/b //patient/ward  //patient/disease pname Patient Patient disease Patient Patient If w1 = w2, then d1 = d2 disease ward disease ward (value equal) d1 w1 d2 w2

  7. hospital patient patient (1) (2) pname pname disease ward ward (2) (1) (1) (1) (1) (1) (2) leukemia W305 W305 Modify partial document using constraints C1(P) C1: //patient  //patient/pname

  8. Floating branch hospital patient patient (2) (1) disease disease pname disease ward ward (1) (1) (1) (2) leukemia leukemia (2) (1) W305 W305 (1) Apply a sequence of constraints: <C2,C3> C2: //patient  //patient//disease C3: //patient/ward  //patient/disease

  9. hospital patient patient (2) (1) disease pname disease ward ward (1) (1) (1) (2) leukemia leukemia (2) (1) W305 W305 (1) Another sequence of constraints: <C3,C2> C2: //patient  //patient//disease C3: //patient/ward  //patient/disease

  10. hospital patient patient (2) (1) disease ward ward (1) (1) (2) leukemia (2) (1) W305 W305 (1) hospital patient patient (2) (1) P2: result of <C3,C2> P1: result of <C2,C3> pname pname disease ward ward disease (1) (1) (1) (1) (2) leukemia (2) (1) W305 leukemia W305 (1) disease disease leukemia They look different! • They have the same amount of “information” • Introduced a concept called “m-equivalence” (see the paper)

  11. Theorem • Given a partial document P and a set of constraints C, there is a document M that can be inferred from P using a sequence of constraints, M m-contains the inferred document of any constraint sequence. • M: computable using a greedy approach. • M: unique under m-equivalence.

  12. Mapping Inference Maximal inferred document M Information leakage Partial Document P Regulating query A

  13. Talk Outline • Information Leakage •  Prevent information leakage • Experiments

  14. Formal Problem • Given an XML document D, a regulating query A, constraints C1,…,Ck. • Find a partial document P without information leakage (“valid partial document”). • P has as much data as possible • Developed an algorithm for solving this problem

  15. (1) (3) pname pname (2) disease (3) (1) (2) ward ward ward (1) disease (2) pname disease (3) (1) Alice (1) W305 (1) Cathy (3) W305 (1) (1) leukemia Betty (2) W305 leukemia (3) (2) leukemia Example hospital (2) (3) (1) patient patient patient Regulating query A patient disease Alice * S Functional dependency: //patient/ward  //patient/disease

  16. (1) (3) pname pname (2) disease (3) (1) (2) ward ward ward (1) disease (2) pname disease (3) (1) Alice (1) W305 (1) Cathy (3) W305 (1) (1) leukemia Betty (2) W305 leukemia (3) (2) leukemia Remove sensitive data A(D) hospital (2) (3) (1) patient patient patient patient disease Alice * S Remaining document: D - A(D)

  17. (1) (3) pname pname (2) disease (3) (1) (2) ward ward ward (1) disease (2) pname disease (3) (1) Alice (1) W305 (1) Cathy (3) W305 (1) (1) leukemia Betty (2) W305 leukemia (3) (2) leukemia Compute the maximal inferred document M of D-A(D) hospital (2) (3) (1) patient patient patient patient disease Alice * S There is a mapping from A to P. So information leaked.

  18. (1) (3) pname pname (2) disease (3) (1) (2) ward ward ward (1) disease (2) pname disease (3) (1) Alice (1) W305 (1) Cathy (3) W305 (1) (1) leukemia Betty (2) W305 leukemia (3) (2) leukemia hospital (2) (3) (1) patient patient patient Regulating query A patient disease Alice * S START AND/OR Graphs OR (1) (1) Alice leukemia

  19. (1) (3) pname pname (2) disease (3) (1) (2) ward ward ward (1) disease (2) pname disease (3) (1) Alice (1) W305 (1) Cathy (3) W305 (1) (1) leukemia Betty (2) W305 leukemia (3) (2) leukemia hospital START (2) (3) (1) patient patient patient OR (1) (1) Alice leukemia AND OR OR (1) (2) (3) (3) (2) W305 W305 W305 leukemia leukemia

  20. Solution graphs START START Requirements: • Connected subgraph including START. • Each node in the subgraph keeps its successor connectors. • OR connector: keep one of its successors. • AND connector: keep all its successors. OR OR (1) Alice (1) leukemia AND OR OR (1) W305

  21. Talk Outline • Information leakage • Prevent Information Leakage •  Experiments

  22. Experiments • Evaluate the effect of data inference on security and our technique • XML constraints • Data sets • course_washington.xml • http://anhai.cs.uiuc.edu/archive/data/courses/washington • 3,904 courses, 162,102 nodes • dblp.xml • http://www.informatik.uni-trier.de/ley/db • About 427,000 publications, 8,728,000 nodes

  23. Information Leakage • Sensitive nodes defined by regulating queries • A1: In course_washington.xml, “Hide codes of all courses.” • A2: In dblp.xml, “Hide authors who published papers in 2001.” //dblp/pub/title  //dblp/pub/author

  24. Effect of number of sensitive nodes • Sensitive nodes randomly selected from the tree Child and descendant constraints in course washington.xml

  25. Effect of number of constraints course_washington.xml

  26. Removing nodes to prevent leakage Course data set DBLP data set

  27. Conclusion and Future Work • Contributions: • Formulated the problem of publishing XML documents without information leakage due to data inference • Showed effect of constraints on inference • Algorithm for finding a valid partial document of a given document • Future work: • Positive regulating queries • Quantify information amount/importance

More Related