an experimental study of association rule hiding techniques
Download
Skip this Video
Download Presentation
An Experimental Study of Association Rule Hiding Techniques

Loading in 2 Seconds...

play fullscreen
1 / 36

An Experimental Study of Association Rule Hiding Techniques - PowerPoint PPT Presentation


  • 209 Views
  • Uploaded on

An Experimental Study of Association Rule Hiding Techniques. Emmanuel Pontikakis* [email protected] Dept. of Computer Engineering and Informatics University of Patras Patra, Greece Vassilios Verykios* [email protected]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An Experimental Study of Association Rule Hiding Techniques' - venus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an experimental study of association rule hiding techniques

An Experimental Study of Association Rule Hiding Techniques

Emmanuel Pontikakis*

[email protected]

Dept. of Computer Engineering and Informatics

University of Patras

Patra, Greece

Vassilios Verykios*

[email protected]

Dept. of Computer and Communication EngineeringUniversity of ThessalyVolos, Greece

*Computer Technology Institute

Research Unit 3

Athens, Greece

outline
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusions
introduction

Data Mining

Association Rules

Hide Sensitive Rules

User

Introduction

Database

Changed

Database

related work
Related Work
  • Association Rule Hiding
    • Blocking-based Technique (Saygin, Verykios, Clifton)
    • Distortion-based (Sanitization) Technique – (Oliveira, Zaiane, Verykios, Dasseni)
outline5
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusion
distortion based techniques

Distortion

Algorithm

Distortion-based Techniques

Sample Database

Distorted Database

Rule A→C has:

Support(A→C)=80%

Confidence(A→C)=100%

Rule A→C has now:

Support(A→C)=40%

Confidence(A→C)=50%

distortion based techniques8
Distortion-based Techniques
  • Challenges/Goals:
    • To minimize the undesirable Side Effects that the hiding process causes to non-sensitive rules.
    • To minimize the number of 1’s that must be deleted in the database.
    • Algorithms must be linear in time as the database increases in size.
our proposal weight based sorting distortion algorithm wsda
Our Proposal: Weight-based Sorting Distortion Algorithm (WSDA)
  • High Level Description:
    • Input:
      • Initial Database
      • Set of Sensitive Rules
      • Safety Margin (for example 10%)
    • Output:
      • Sanitized Database
      • Sensitive Rules no longer hold in the Database
wsda algorithm
WSDA Algorithm
  • High Level Description:
    • 1st step:
      • Retrieve the set of transactions which support sensitive rule RS
      • For each sensitive rule RS find the number N1 of transaction in which, one item that supports the rule will be deleted
wsda algorithm11
WSDA Algorithm
  • High Level Description:
    • 2nd step:
      • For each rule Ri in the Database with common items with RScompute a weight w that denotes how strong is Ri
      • For each transaction that supports RS compute a priority Pi, that denotes how many strong rules this transaction supports
wsda algorithm12
WSDA Algorithm
  • High Level Description:
    • 3rd step:
      • Sort the N1 transactions in ascending order according to their priority value Pi
    • 4th step:
      • For the first N1 transactions hide an item that is contained in RS
wsda algorithm13
WSDA Algorithm
  • High Level Description:
    • 5th step:
      • Update confidence and support values for other rules in the database
experimental results of wsda algorithm

Rules Changed

In the Database

Itemsets Remained unaffected

in the Database

Experimental Results of WSDA algorithm
experimental results of wsda algorithm15

Average number of items

per transaction: 13/50

Average number of items

per transaction: 20/50

Experimental Results of WSDA algorithm
outline16
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusion
quality of data
Quality of Data
  • Sometimes it is dangerous to delete some items from the database (etc. medical databases) because the false data may create undesirable effects.
  • So, we have to hide the rules in the database by adding uncertainty without distorting the database.
blocking based techniques

Blocking

Algorithm

Blocking-based Techniques

Initial Database

New Database

Support and Confidence becomes marginal.

In New Database: 60% ≤ conf(A → C) ≤ 100%

modification of association rule definition
Modification of Association Rule Definition
  • A rule’s A→B confidence and support becomes marginal:

sup(A→B)[minsup(A→B), maxsup(A→B)]

conf(A→B) [minconf(A→B), maxconf(A→B)]

  • minsup(A→B)=
  • maxsup(A→B)=
modification of association rule definition20
Modification of Association Rule Definition
  • minconf(A→B)=
  • maxconf(A→B)=
negative border rules set nbrs definition
Negative Border Rules Set (NBRS) Definition
  • When a rule R has either
    • sup(R)>MST AND conf(R)<MCT

OR

    • sup(R)<MST AND conf(R)>MCT,

then we say that R belongs to NBRS.

privacy breaches definitions
Privacy Breaches Definitions
  • If an item i, some values of which, are hidden by ?’s, is contained in a sensitive rule, a privacy breach will occur if the adversary can assume that with c% confidence.
  • For a rule R with maxconf(R)>MCT, a privacy breach occurs if it can be estimated, with c% confidence, that R is either a sensitive or a ghost rule.
  • For a blocked item iin a specific transaction T, a privacy breach occurs if the adversary can estimate with c%confidence that its original value is either 0 or 1.
blocking based techniques24
Blocking-Based Techniques
  • Goals that an algorithm has to achieve:
  • To put a relatively small number of ?’s and reduce significantly the confidence of senstitive rules.
  • To minimize the undesirable side effects (rules and itemsets lost) by selecting the items in the appropriate transactions to change, and maximize the desirable side effects.
  • To modify the database in a way that an adversary cannot recover the original values of the database.
our proposal blocking algorithm ba
Our Proposal: Blocking Algorithm (BA)
  • High Level Description
    • 1st step:
      • For each sensitive rule RS (Rule RS has left itemset IL and right itemset IR) compute how many 0’s and 1’s you have to block, in order to reduce the confidence of RS.
    • 2nd step:
      • Find the set of transactions TR that support RS or the set of transactions TLpR’that support partially RS (support partially the left itemset and do not support the right itemset).
      • For each transaction in TRfind the rules Rcommonwith at least one common item with IRand for each transaction in TLpR’find the R’common∈NBRSwith at least one common item with IL.
      • Assign a weight wfor each Rcommonand a weight w’ for each R’common.
      • Assign a PTfor each transaction in T such as PTis large if transaction Ti has many Rcommon rules with large w, and a priority value PT’for each Ti’ such as PT’is small if transaction T has many Rcommon rules with large w’.
blocking algorithm
Blocking Algorithm
  • High Level Description
    • 3rd step:
      • Sort T∈TRstarting from them with lowest PTi. and sort T’∈TL’Rpstarting from them with highest PTi’.
    • 4th step:
      • For the first N1sorted T∈TRblock an item i∈IRand for the first N0sorted T∈TL’Rp block an item i∈ IL
    • 5th step:
      • Update values minconf(Ri),minsup(Ri), for all other rules that have been affected.
blocking based techniques27
Blocking-Based Techniques
  • Main Problems of blocking technique:
  • The maximum confidence of a sensitive rule cannot be reduced.
  • An adversary can infer the hidden values if he applies a smart inference technique, if the blocking algorithm does not add much uncertainty in the database.
  • Both 0’s and 1’s must be hidden, because if only 1’s were hidden the adversary would simply replace all the ?’s with 1’s and would restore easily the initial database.
  • Many ?’s must be inserted, if we don’t want an adversary to infer hidden data.
experimental results of blocking algorithm

Large Itemsets Remained after

The hiding process

Rules changed (%) after the

process

Experimental Results of Blocking Algorithm
experimental results of blocking algorithm 2

Databases with average

13 items per transaction

Databases with average

20 items per transaction

Experimental Results of Blocking Algorithm (2)
experimental results of blocking algorithm 3

Rules changed, when we

Change the proportion 0:1

Decision Tree Experiments

Misclassified Items (%)

Experimental Results of Blocking Algorithm (3)
outline31
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusions
outline33
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusions
conclusions
Conclusions
  • There are open research problems in Blocking Technique:
    • A) What techniques must be used in order to reduce the privacy breaches?
    • B) In what other ways can we prevent an adversary from inferring the association rules in the database?
    • C) Maybe applying a chi-square test to the final database reveal some correlations between the items
references
References
  • [Evfimienski et.al] Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke. Privacy Preserving Mining of Association Rules. SIGKDD 2002, Edmonton, Alberta Canada.
  • Murat Kantarcioglou and Chris Clifton, Privacy Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data, In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2002), 24–31.
  • Jaideep Vaidya and Chris Clifton, Privacy Preserving Association Rule Mining in Vertically Partitioned Data, In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002), 639–644.
references36
References
  • Stanley R. M. Oliveira and Osmar R. Zaïane. Algorithms for Balacing Privacy and Knowledge Discovery in Association Rule Mining.  In Proc. of the Seventh International Database Engineering & Applications Symposium (IDEAS\'03), pp. 54-63, Hong Kong, July 16-18, 2003.
  • Yucel Saygin, Vassilios Verykios, and Chris Clifton, Using Unknowns to Prevent Discovery of Association Rules, SIGMOD Record 30 (2001), no. 4, 45–54.
  • S. Verykios, Ahmed K. Elmagarmid, Bertino Elisa, Yucel Saygin, and Dasseni Elena, Association Rule Hiding, IEEE Transactions on Knowledge and Data Engineering (2003).
ad