Loading in 2 Seconds...

An Experimental Study of Association Rule Hiding Techniques

Loading in 2 Seconds...

- By
**venus** - Follow User

- 209 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'An Experimental Study of Association Rule Hiding Techniques' - venus

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### An Experimental Study of Association Rule Hiding Techniques

Emmanuel Pontikakis*

Dept. of Computer Engineering and Informatics

University of Patras

Patra, Greece

Vassilios Verykios*

Dept. of Computer and Communication EngineeringUniversity of ThessalyVolos, Greece

*Computer Technology Institute

Research Unit 3

Athens, Greece

Outline

- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusions

Related Work

- Association Rule Hiding
- Blocking-based Technique (Saygin, Verykios, Clifton)
- Distortion-based (Sanitization) Technique – (Oliveira, Zaiane, Verykios, Dasseni)

Outline

- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusion

Algorithm

Distortion-based TechniquesSample Database

Distorted Database

Rule A→C has:

Support(A→C)=80%

Confidence(A→C)=100%

Rule A→C has now:

Support(A→C)=40%

Confidence(A→C)=50%

Distortion-based Techniques

- Challenges/Goals:
- To minimize the undesirable Side Effects that the hiding process causes to non-sensitive rules.
- To minimize the number of 1’s that must be deleted in the database.
- Algorithms must be linear in time as the database increases in size.

Our Proposal: Weight-based Sorting Distortion Algorithm (WSDA)

- High Level Description:
- Input:
- Initial Database
- Set of Sensitive Rules
- Safety Margin (for example 10%)
- Output:
- Sanitized Database
- Sensitive Rules no longer hold in the Database

WSDA Algorithm

- High Level Description:
- 1st step:
- Retrieve the set of transactions which support sensitive rule RS
- For each sensitive rule RS find the number N1 of transaction in which, one item that supports the rule will be deleted

WSDA Algorithm

- High Level Description:
- 2nd step:
- For each rule Ri in the Database with common items with RScompute a weight w that denotes how strong is Ri
- For each transaction that supports RS compute a priority Pi, that denotes how many strong rules this transaction supports

WSDA Algorithm

- High Level Description:
- 3rd step:
- Sort the N1 transactions in ascending order according to their priority value Pi
- 4th step:
- For the first N1 transactions hide an item that is contained in RS

WSDA Algorithm

- High Level Description:
- 5th step:
- Update confidence and support values for other rules in the database

per transaction: 13/50

Average number of items

per transaction: 20/50

Experimental Results of WSDA algorithmOutline

- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusion

Quality of Data

- Sometimes it is dangerous to delete some items from the database (etc. medical databases) because the false data may create undesirable effects.
- So, we have to hide the rules in the database by adding uncertainty without distorting the database.

Algorithm

Blocking-based TechniquesInitial Database

New Database

Support and Confidence becomes marginal.

In New Database: 60% ≤ conf(A → C) ≤ 100%

Modification of Association Rule Definition

- A rule’s A→B confidence and support becomes marginal:

sup(A→B)[minsup(A→B), maxsup(A→B)]

conf(A→B) [minconf(A→B), maxconf(A→B)]

- minsup(A→B)=
- maxsup(A→B)=

Modification of Association Rule Definition

- minconf(A→B)=
- maxconf(A→B)=

Negative Border Rules Set (NBRS) Definition

- When a rule R has either
- sup(R)>MST AND conf(R)<MCT

OR

- sup(R)<MST AND conf(R)>MCT,

then we say that R belongs to NBRS.

Privacy Breaches Definitions

- If an item i, some values of which, are hidden by ?’s, is contained in a sensitive rule, a privacy breach will occur if the adversary can assume that with c% confidence.
- For a rule R with maxconf(R)>MCT, a privacy breach occurs if it can be estimated, with c% confidence, that R is either a sensitive or a ghost rule.
- For a blocked item iin a specific transaction T, a privacy breach occurs if the adversary can estimate with c%confidence that its original value is either 0 or 1.

Blocking-Based Techniques

- Goals that an algorithm has to achieve:
- To put a relatively small number of ?’s and reduce significantly the confidence of senstitive rules.
- To minimize the undesirable side effects (rules and itemsets lost) by selecting the items in the appropriate transactions to change, and maximize the desirable side effects.
- To modify the database in a way that an adversary cannot recover the original values of the database.

Our Proposal: Blocking Algorithm (BA)

- High Level Description
- 1st step:
- For each sensitive rule RS (Rule RS has left itemset IL and right itemset IR) compute how many 0’s and 1’s you have to block, in order to reduce the confidence of RS.
- 2nd step:
- Find the set of transactions TR that support RS or the set of transactions TLpR’that support partially RS (support partially the left itemset and do not support the right itemset).
- For each transaction in TRfind the rules Rcommonwith at least one common item with IRand for each transaction in TLpR’find the R’common∈NBRSwith at least one common item with IL.
- Assign a weight wfor each Rcommonand a weight w’ for each R’common.
- Assign a PTfor each transaction in T such as PTis large if transaction Ti has many Rcommon rules with large w, and a priority value PT’for each Ti’ such as PT’is small if transaction T has many Rcommon rules with large w’.

Blocking Algorithm

- High Level Description
- 3rd step:
- Sort T∈TRstarting from them with lowest PTi. and sort T’∈TL’Rpstarting from them with highest PTi’.
- 4th step:
- For the first N1sorted T∈TRblock an item i∈IRand for the first N0sorted T∈TL’Rp block an item i∈ IL
- 5th step:
- Update values minconf(Ri),minsup(Ri), for all other rules that have been affected.

Blocking-Based Techniques

- Main Problems of blocking technique:
- The maximum confidence of a sensitive rule cannot be reduced.
- An adversary can infer the hidden values if he applies a smart inference technique, if the blocking algorithm does not add much uncertainty in the database.
- Both 0’s and 1’s must be hidden, because if only 1’s were hidden the adversary would simply replace all the ?’s with 1’s and would restore easily the initial database.
- Many ?’s must be inserted, if we don’t want an adversary to infer hidden data.

13 items per transaction

Databases with average

20 items per transaction

Experimental Results of Blocking Algorithm (2)Change the proportion 0:1

Decision Tree Experiments

Misclassified Items (%)

Experimental Results of Blocking Algorithm (3)Outline

- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusions

Outline

- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusions

Conclusions

- There are open research problems in Blocking Technique:
- A) What techniques must be used in order to reduce the privacy breaches?
- B) In what other ways can we prevent an adversary from inferring the association rules in the database?
- C) Maybe applying a chi-square test to the final database reveal some correlations between the items

References

- [Evfimienski et.al] Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke. Privacy Preserving Mining of Association Rules. SIGKDD 2002, Edmonton, Alberta Canada.
- Murat Kantarcioglou and Chris Clifton, Privacy Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data, In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2002), 24–31.
- Jaideep Vaidya and Chris Clifton, Privacy Preserving Association Rule Mining in Vertically Partitioned Data, In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002), 639–644.

References

- Stanley R. M. Oliveira and Osmar R. Zaïane. Algorithms for Balacing Privacy and Knowledge Discovery in Association Rule Mining. In Proc. of the Seventh International Database Engineering & Applications Symposium (IDEAS\'03), pp. 54-63, Hong Kong, July 16-18, 2003.
- Yucel Saygin, Vassilios Verykios, and Chris Clifton, Using Unknowns to Prevent Discovery of Association Rules, SIGMOD Record 30 (2001), no. 4, 45–54.
- S. Verykios, Ahmed K. Elmagarmid, Bertino Elisa, Yucel Saygin, and Dasseni Elena, Association Rule Hiding, IEEE Transactions on Knowledge and Data Engineering (2003).

Download Presentation

Connecting to Server..