1 / 51

Privacy Framework for RDF Data Mining

Privacy Framework for RDF Data Mining. Master’s Thesis Project Proposal By: Yotam Aron. Overview . Motivation and Goal Background Proposed Solution and Design Example Conclusion. Motivation. D ata mining continues to become more widespread. Useful for research, public policy, etc.

lexine
Download Presentation

Privacy Framework for RDF Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: YotamAron

  2. Overview • Motivation and Goal • Background • Proposed Solution and Design • Example • Conclusion

  3. Motivation • Data mining continues to become more widespread. • Useful for research, public policy, etc. • Want to maintain privacy of participants in the database. • Little work has been done for privacy for semantic web data.

  4. Previous Work • Anonymization • K-Anonimity1 • Differential Privacy systems: PINQ2, AIRAVAT3. • Drawbacks: • Do not apply to semantic web data. • Do not support SPARQL.

  5. Goal • Develop a system to protect dataset participants’ personal data in SPARQL. • Integrates well with existing SPARQL endpoints. • Relatively easy for the user and the administrator to use.

  6. Background • Rule-based Privacy Policies in AIR • Differential Privacy

  7. Rule-based Privacy Policies in AIR4 • Rules define patterns in a SPARQL query. • If pattern is matched, rule infers compliance or non-compliance of incoming SPARQL query.

  8. AIR Example5 AIR Policy (extract) • AIR will show that the query is non-compliant with Policy4. air:if { :W s:TriplePattern :T . :T log:includes { :X type:F :V }. };air:then [air:description(“type:F was selected in " q:QUERY) ;air:assert { q:QUERY air:non-compliant-with q:Policy4 . } ] . Query SELECT ?s WHERE {?s type:F ?p}

  9. Differential Privacy Overview • Minimize probability of privacy breach. • Maximize statistical accuracy. • Definition requires that given two similar datasets, a function query on those two datasets give similar results with high probability. • Makes no assumptions on the underlying dataset.

  10. Differential Privacy • Definition: We say a randomized computation M provides ɛ-differential privacyif for any two data sets A and B, and any set of possible outputs S ⊆ Range(M), Pr[M(A) ∈ S] ≤ Pr[M(B) ∈ S] × exp(ɛ × |A ⊕ B|).

  11. Differential Privacy in Practice • Each user is given an ɛ value that cannot be exceeded. • Each query qi has some noise value ɛi. In total, the user’s queries must satisfy the property • Noise (usually Laplace), which depends on the aggregate function, is added with variance

  12. Limitations of Differential Privacy • Only statistical data protected. • High variance in data yields poor query results. • Theory not always perfect in practice. • Assume no collusion among users. • Covert channel attacks.6 • What value of ɛ to choose?

  13. Example, No DP SELECT COUNT(Name) WHERE (Age < 25) 2

  14. Example, No DP SELECT COUNT(Name) WHERE (Age < 25) 1 Big difference in answers!!

  15. Example, With DP SELECT COUNT(Name) WHERE (Age < 25) 2 + noise = ~2 (with high probability)

  16. Example, With DP SELECT COUNT(Name) WHERE (Age < 25) 1+ noise = ~2(with high probability) With high probability, records are indistinguishable!

  17. Practical Consequences of DP • An individual’s inclusion in the dataset is not likely a privacy risk. • The answers to the queries can still be useful.

  18. Achieving Differential Privacy in RDF • Current techniques for differential privacy are developed for relational databases. • As a first approximation, reduce triple-store to a relational database. • Improved mechanism as project progresses.

  19. Example of RDF-RDBS Reduction :Person1 foaf:name “Alice”; foaf:member :DIG foaf:age “21” foaf:knows :Person2 :Person3. :Person2 foaf:name “Bob”; foaf:member :DIG; foaf:knows:Person3. :Person3 foaf:name “Charlie”; foaf:age “22”.

  20. Proposed Solution • SPARQL Privacy Insurance Module(SPIM) • Build layer between user and endpoint. • Integrate both AIR and differential privacy. • Integrate credential-checking system. • Modify existing differential privacy framework for use with triple-stores.

  21. Contributions • Complete privacy protection for triplestores. • Differential Privacy sensitivity for SPARQL 1.1 aggregate functions including count, sum, avg, sum, min, and max.

  22. System Overview

  23. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data Differential Privacy Module Service Description

  24. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data Differential Privacy Module Service Description

  25. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • TAAC Will: • Verify user has permission to access • Send central module data about user Differential Privacy Module Service Description

  26. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM: • Controls order of privacy operations. • Interfaces with the SPARQL endpoint. Differential Privacy Module Service Description

  27. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • AIR: • Reasoner that uses rule-based policies to check queries for privacy hazards. • Extracts information for differential privacy. Differential Privacy Module Service Description

  28. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Policy Files: • Contain the rules for AIR. Differential Privacy Module Service Description

  29. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Differential Privacy Module: • Checks to see for query limits (based off ɛ use. • Applies noise to statistical data. Differential Privacy Module Service Description

  30. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • User Data: • Contains user ɛ data. Differential Privacy Module Service Description

  31. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM: • Controls order of privacy operations. • Interfaces with the SPARQL endpoint. Differential Privacy Module Service Description

  32. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Service Description: • Contains information to be used for the addition of noise. Differential Privacy Module Service Description

  33. Miscellaneous: • Interface to SPARQL Endpoint • Transaction File • Improved Differential Privacy Output • Service Description Generator

  34. Potential Extensions: • Robustness against attacks • Concurrency • Optimization for large systems • Customizable UI • Accountability

  35. Sample Scenario • Triplestoredatamining in biotechnological applications. • Biofirm provides data about hospitals in the US. • Alice is a PhD student at MIT. • Alice would like to query Biofirm’s database for research purposes. She just got permissions yesterday and is logging in for the first time.

  36. Preprocessing • Biofirm installs SPIM, and runs the service description generation code. • May need to create the correct interface. • Makes sure the UI is accessible online.

  37. Sample Compliant Query • Alice would like to know the total number of visits that Boston hospitals received. SELECT (SUM(?s) as ?people) WHERE{ ?h a biofirm:Hospital. ?h biofirm:visits ?s. ?h biofirm:locationgeo:Boston. } Epsilon value: 1.0

  38. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data Differential Privacy Module • Alice enters query into the provided user interface. Service Description

  39. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • TAAC insures that biofirm has given Alice access to its triple-store. Differential Privacy Module Service Description

  40. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Query request arrives at SPIM central module. Differential Privacy Module Service Description

  41. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface • Policyrunner is called upon to check query for triple patterns that are in violation. • No violations found. • Since this is Alice’s first time, AIR extracts what type of permissions Alice has. User Data Differential Privacy Module Service Description

  42. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM creates a profile for Alice. • Gives her an ɛ value (suppose it 2.0). • Stores it in triple store. Differential Privacy Module Service Description

  43. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM extracts which variables will yield statistical results and will have differential privacy applied. Differential Privacy Module Service Description

  44. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Differential Privacy module assures that query’s results will not exceed given epsilon value. Differential Privacy Module Service Description

  45. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • This is Alice’s first time, and her epsilon value is 2.0 and the epsilon for this query is 1.0. Everything looks good. Differential Privacy Module Service Description

  46. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Query is sent to the endpoint. • Results are received. Differential Privacy Module Service Description

  47. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Differential privacy module adds noise to appropriate fields, and updates epsilon values. Differential Privacy Module Service Description

  48. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM is ready to return the results. Differential Privacy Module Service Description

  49. Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Alice receives results. Differential Privacy Module Service Description

  50. Summary • System will combine rule-based privacy with differential privacy. • Develop differential privacy techniques for semantic web data. • Make privacy module client and administrator friendly.

More Related