privacy preserving schema matching using mutual information n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Privacy-Preserving Schema Matching Using Mutual Information PowerPoint Presentation
Download Presentation
Privacy-Preserving Schema Matching Using Mutual Information

Loading in 2 Seconds...

play fullscreen
1 / 10

Privacy-Preserving Schema Matching Using Mutual Information - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

Privacy-Preserving Schema Matching Using Mutual Information. Isabel F. Cruz University of Illinois at Chicago Roberto Tamassia Danfeng Yao Brown University. Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS–0513553.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Privacy-Preserving Schema Matching Using Mutual Information' - dick


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
privacy preserving schema matching using mutual information

Privacy-Preserving Schema Matching Using Mutual Information

Isabel F. Cruz

University of Illinois at Chicago

Roberto Tamassia Danfeng Yao

Brown University

Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS–0513553

DBSec 2007, Redondo Beach, CA

heterogeneous databases
Heterogeneous databases

Query: join patients’ records in medical database A, B, and C

DBSec 2007, Redondo Beach, CA

the need for schema matching
The need for schema matching

Database A

Database B

How to find out the correspondence of attribute names in A and B ?

DBSec 2007, Redondo Beach, CA

privacy in schema matching
Privacy in schema matching
  • Data interoperability requires schema matching
  • However, data owners may consider schema sensitive
  • Need to develop privacy-preserving schema matching methods
  • Related work:
    • Privacy-preserving data sharing [Clifton Kantarcioglu Doan Schadow Vaidya Elmagarmid Suciu 04]
    • Privacy-preserving ontology matching [Mitra Liu Pan 05]
    • Privacy-preserving access control to heterogeneous databases [Mitra Liu Pan Atluri06]
    • Privacy-preserving schema and data matching [Scannapieco Figotin Bertino Elmagarmid 07]

DBSec 2007, Redondo Beach, CA

overview of our approach
Key observation 1: same attributes have similar data distributions and correlate similarly to other attributes

Probability distribution of attributes (e.g., heart rate, age, height, blood pressure)

Mutual information (MI) captures the correlation of attributes (e.g., age and blood pressure)

Key observation 2: we reduce private schema matching to 2-party private set intersection

Only intersected elements are returned and nothing else

Our approach for private schema matching

View self and mutual information (MI) values of each schema as sets of numbers

Match the MIs of two schemas using private set intersection

Overview of our approach

DBSec 2007, Redondo Beach, CA

building block pair wise mutual information mi

Patient type

1.5

1.0

1.5

2.0

1.0

Heart rate

Blood type

1.0

Node A

1.5

1.0

1.5

Node B

Node C

2.0

1.0

1.0

Building block: pair-wise mutual information (MI)

Assume that schemas are not private info [Kang Naughton 03]

1. Party A with schema A computes MI and constructs a graph

2. Party B with schema B constructs its graph

3. Both parties then find a correspondence of the two graphs

DBSec 2007, Redondo Beach, CA

building block privacy preserving set intersection
Building block: Privacy-preserving set intersection
  • An efficient protocol based on homomorphic encryption was proposed by [Freedman Nissim Pinkas 04]

Interactive protocol

3, 6, 15, 20, 88

3, 7, 17, 20, 80

Output

3, 20

Alice and Bob only learn the intersected elements

Secure against malicious adversaries

DBSec 2007, Redondo Beach, CA

our approach privacy preserving schema mapping

(1.5, 1.5, 1.0)

MI set

(1.0, 1.0, 1.0)

Our approach: Privacy-preserving schema mapping
  • Two players: A and B, each with a private schema
  • A and B compute MI of schema attributes and graphs, respectively
    • Each attribute has a MI set: attribute entropy and pair-wise MI
  • A and B sort the entropies (self MIs) of attributes, respectively
  • For each attribute, A and B carry our private set intersection
    • If attributes match, then set intersection returns the entire MI set

Patient type

A

1.5

1.0

1.5

Blood type

2.0

1.0

Heart rate

1.0

DBSec 2007, Redondo Beach, CA

properties
Properties
  • Support of three types of schema mappings
    • One-to-one, onto, partial mappings
  • Security property
    • Basic metod secure against semi-honest adversaries
    • Advanced method (uses zero knowledge) secure against malicious adversaries
  • Complexity property
    • Assuming entropies of distinct attributes are different, we perform a linear (proportional to the number of attributes of A and B) number of privacy-preserving set intersections

Partial

Onto

One-to-one

DBSec 2007, Redondo Beach, CA

theorems
Theorems
  • Theorem 1 (Security): Assuming the existence of a private set intersection protocol against malicious adversaries, our privacy-preserving schema matching protocol for one-to-one, onto, and partial mappings is secure against malicious adversaries.
  • Definition: The multiplicity value mi of element ai in a list L with l elements and k distinct elements is the number of times element ai (1 ≤ i ≤ k) appears in L. The multiplicity sequence of L is (m1, m2, …, mk) where m1+ m2 + … + mk = l
  • Theorem 2 (Complexity): Consider a schema A with m attributes and a schema B with n attributes. Let (m1, m2, …, mk) be the multiplicity sequence of the entropy list of A and let (n1, n2, …, nk) be the multiplicity sequence of the entropy list of B by removing the elements not present in the entropy list of A. We have that the number of set intersections performed in our privacy-preserving schema matching protocol is at most: k

 mi ni

i=1

DBSec 2007, Redondo Beach, CA