Preservation of proximity privacy in publishing numerical sensitive data
Download
1 / 29

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data. J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian. Outline. What is PPDP Existing Privacy Principles Proximity Attack ( ε , m)-anonymity Determine ε and m Algorithm Experiments and Conclusion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Preservation of Proximity Privacy in Publishing Numerical Sensitive Data' - hanzila


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Preservation of proximity privacy in publishing numerical sensitive data

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

J. Li, Y. Tao, and X. Xiao

SIGMOD 08

Presented by Hongwei Tian


Outline
Outline Sensitive Data

  • What is PPDP

    • Existing Privacy Principles

  • Proximity Attack

    • (ε, m)-anonymity

    • Determine εand m

    • Algorithm

  • Experiments and Conclusion


Privacy preservation data publishing
Privacy Preservation Data Publishing Sensitive Data

  • A true story in Massachusetts, 1997

    • GIC

    • 20 dollars

    • Governor Weld


PPDP Sensitive Data

  • Privacy

    • Sensitive information of individuals should be protected in the published data

    • More anonymized data

    • Utility

    • The published data should be useful

    • More accurate data


PPDP Sensitive Data

  • Anonymization Technique

    • Generalization

      • Specific value -> General value

      • Maintain the semantic meaning

        • 78256 -> 7825*, UTSA -> University, 28 -> [20, 30]

    • Perturbation

      • One value -> another random value

      • Huge information loss -> poor utility


PPDP Sensitive Data

  • Example of Generalization


Some existing privacy principles
Some Existing Privacy Principles Sensitive Data

  • Generalization

    • SA – Categorical

      • k-anonymity

      • l-diversity, (α, k)-anonymity, m-invariance, …

      • (c, k)-safety, Skyline-privacy

    • SA – Numerical

      • (k, e)-anonymity, Variance Control

      • t-closeness

      • δ-presence


Next… Sensitive Data

  • What is PPDP

    • Existing Privacy Principles

  • Proximity Attack

    • (ε, m)-anonymity

    • Determine εand m

    • Algorithm

  • Experiments and Conclusion


Proximity attack
Proximity Attack Sensitive Data


M anonymity
( Sensitive Dataε, m)-anonymity

  • I(t)

    • private neighborhood of tuple t

    • I(t) = [t.SA − ε, t.SA + ε]

    • I(t) = [t.SA·(1 − ε), t.SA·(1 + ε)]

  • P(t)

    • the risk of proximity breach of tuple t

    • P(t) = x / |G|


M anonymity1
( Sensitive Dataε, m)-anonymity

  • ε = 20

  • I(t1) = [980, 1020]

  • x = 3, |G| = 4

  • P(t1) = 3/4


M anonymity2
( Sensitive Dataε, m)-anonymity

  • Principle

    • Given a real value ε and an integer m ≥ 1, a generalized table T∗ fulfills absolute (relative) (ε,m)-anonymity, if

      P(t) ≤ 1/m

      for every tuple t ∈ T.

    • Larger ε and m mean stricter privacy requirement


M anonymity3
( Sensitive Dataε, m)-anonymity

  • What is the Meaning of m?

    • |G| ≥ m

    • The best situation is for any two tuples ti and tj in G, and

    • Similar to l-diversity when the equivalence class has l tuples with distinct SA values.


M anonymity4
( Sensitive Dataε, m)-anonymity

  • How to make tj.SA does not fall in I(ti)?

    • All tuples in G are sorted in ascending order of their SA values

    • | j – i | ≧ max{ |left(tj,G)|, |right(ti,G)| }


M anonymity5
( Sensitive Dataε, m)-anonymity

  • Let maxsize(G) =

    max∀t∈G { max{ |left(t,G)|, |right(t,G)| } }

  • | j – i | ≧ maxsize(G)


M anonymity6
( Sensitive Dataε, m)-anonymity

  • Partitioning

    • Ascending order of tuples in G according to SA values

    • Hash the ith tuple into the jth bucket using function j = (i mod maxsize(G))+1

    • Thus, all tuples (SA values) in the same bucket do not fall into the neighborhood of each other.


M anonymity7
( Sensitive Dataε, m)-anonymity

  • (6, 2)-anonymity

    • Privacy is breached

    • P(t3)= ¾ >1/m =1/2

  • Need partitioning

    • An ascending order is ready according to SA values

    • g = maxsize(G) = 2

    • j = (i mod 2)+1

    • New P(t3)= 1/2


Determine and m
Determine Sensitive Dataεand m

  • Given εand m

    • Check if an equivalence class G satisfies (ε, m)-anonymity

    • Theorem: G has at least one (ε, m)-anonymous generalization, iff

      • Scan the sorted tuples in G to find maxsize(G)

      • Predict whether G can be partitioned or not


Algorithm
Algorithm Sensitive Data

  • Step 1: Splitting

    • Mondrain, ICDE 2006.

    • Splitting is only based on QI-attributes

    • Iteratively find median value of frequency sets on one selected QI-dimension to cut G into G1 and G2, and make sure G1 and G2 are legal to be partitioned.


Algorithm1
Algorithm Sensitive Data

  • Splitting ((6, 2)-anonymity)

10

40

20

25

50

30


Algorithm2
Algorithm Sensitive Data

  • Step 2: Partitioning

    • After step 1 stops

    • Check all G produced by splitting

      • Release directly if G satisfies (ε, m)-anonymity

      • Otherwise, Partitioning, and then release new buckets


Algorithm3
Algorithm Sensitive Data

  • Partitioning ((6, 2)-anonymity)

10

40

20

25

50

30


Next… Sensitive Data

  • What is PPDP

  • Evolution of Privacy Preservation

  • Proximity Attack

    • (ε, m)-anonymity

    • determine εand m

    • algorithm

  • Experiments and Conclusion


Experiments
Experiments Sensitive Data

  • Real Database SAL http://ipums.org

    • Attributes are Age, Birthplace, Occupation and Income with domains [16,93], [1,710], [1,983], and [1k, 100k], respectively.

    • 500K tuples

  • Compare to a perturbation method (OLAP, SIGMOD 2005 )


Experiments utility
Experiments - Utility Sensitive Data

  • Use count query with workload = 1000


Experiments utility1
Experiments - Utility Sensitive Data


Experiments efficiency
Experiments - Efficiency Sensitive Data


Conclusion
Conclusion Sensitive Data

  • Discuss most of existing privacy principles in PPDP

  • Identify the proximity attack and propose (ε, m)-anonymity to prevent this attack

  • Verify that the method is effective and efficient experimentally


Any question
Any Question? Sensitive Data


ad