1 / 25

Privacy-oriented Data Mining by Proof Checking

Privacy-oriented Data Mining by Proof Checking. Stan Matwin (joint work with Amy Felty ) SITE University of Ottawa, Canada stan@site.uottawa.ca. The TAMALE Group. 4 profs Some 30 graduate students Areas: machine learning, data mining, text mining, NLP, data warehousing Research in

becca
Download Presentation

Privacy-oriented Data Mining by Proof Checking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy-oriented Data Mining by Proof Checking Stan Matwin (joint work with Amy Felty ) SITE University of Ottawa, Canada stan@site.uottawa.ca

  2. The TAMALE Group • 4 profs • Some 30 graduate students • Areas: machine learning, data mining, text mining, NLP, data warehousing • Research in • Inductive Logic Programming • Text mining • Learning in the presence of knowledge • Applications of ML/DM (e.g. in SE: tools for maintenance personnel) UBC 29/7/03

  3. Why did I get into this research? • what is already being done… and why it ’s not enough • the main idea • its operation • discussion – ’correctness  ’ • prototype - Coq and CIC • example • some technical challenges • acceptance? UBC 29/7/03

  4. Some useful concepts... • opting outvs opting in • Use Limitation Principle: data should be used only for the explicit purpose for which it has been collected UBC 29/7/03

  5. …and existing technical proposals On the web: P3P Platform for Privacy Preferences • W3C standard • XML specifications - on websites and in browsers - of what can be collected and for what purpose - ULP? • Handles cookies • Data exchange protocol more than privacy protocol: no provisions for opting out after an initial opt-in • the ULP part is in NL…not verifiable UBC 29/7/03

  6. Agrawal ’s data perturbation transformations • data is perturbed by random distortion: xi xi+r r uniform or gaussian • a procedure to reconstruct a PAC-esitimation of the original distribution (but not the values) • a procedure to build an accurate decision tree on the perturbed distribution UBC 29/7/03

  7. Agrawal ’s transformations cont’d • proposes a measure to quantify privacy –estimateintervals and their size • lately extended to non-numerical attributes, and to association rules • does not address the ULP • how do we know it is applied? UBC 29/7/03

  8. the main idea: towards a verifiable ULP • User sets permissions: what can and cannot be done with her data • Any claim that a software respects these permissions is a proof of a theorem about the software • Verifying the claim is then checking that proof against the software UBC 29/7/03

  9. Who are the players? • User C • Data miner Org • Data mining software developer Dev • Independent verifier Veri • …BUT no one owns the dataD UBC 29/7/03

  10. D: database scheme A: given set of database and data mining operations S: source code for A PC(D,A): C’s permissions T(PC,S): theorem that S respects PC R(PC,S): proof of T(PC,S): B: binary code of S UBC 29/7/03

  11. Discussion - properties • It can be proven that C ’s permissions are respected (or not): PC is in fact a verifiable ULP • PC can be negative (out) or positive (in) • proof construction needs to be done only once for a given PC ,D and A • Scheme is robust against cheating by Dev or Org UBC 29/7/03

  12. Acceptance issues • No Org will give Veri access to S • Too much overhead to check R(PC,S) for each task, and each user • Too cumbersome for C • Based on all Orgs buying in UBC 29/7/03

  13. Acceptance1:Veri’s operation- access • Veri needs • PC from C • R(S, PC) from Dev • S from Dev • B from Org • Veri could check R(S, PC) at Dev’s • Veri needs to verify that S(belonging normally to Dev) corresponds to Bthat Org runs. UBC 29/7/03

  14. Acceptance2:overhead • Veri runs proof checking on a control basis • Org’s execution ovhd ? UBC 29/7/03

  15. Issues • Naming the fields: XML or disclosure • restricted class of theorems for a given P-automating proof techniques for this class UBC 29/7/03

  16. Acceptance3: C’s perspective • Building PCs must be easy for C, based on D and processing schema; initially a closed set? • permissions could be encoded on a credit card, smart card, in the electronic wallet • or in the CA; they can then be dynamically modified and revoked UBC 29/7/03

  17. « Political »  aspects: who is Veri? • generally trusted – • « consumer association »? • « Ralph Nader »? • « transparency international »? • IT expert at the level of instrumenting and running the proof checker – connection to Open Software Foundation? • theorem proving can be cast as « better testing » UBC 29/7/03

  18. how to make Orgs buy in? • The first Org is needed to volunteer • a Green Data Mining logo will be granted and administered (verified) by Veri • other Orgs will have an incentive to join UBC 29/7/03

  19. Future work • Build the tools • expand the prototype • extend from Weka to commercial data mining packages • Integrate with P3P? • find a willing Org UBC 29/7/03

  20. UBC 29/7/03

  21. Link between S and B • compilation not an option • watermarking solution: B is watermaked by a slightly modified compiler with MD5(tar(S)) =128 bytes • marks are inserted by a trusted makefile-and-compiler in locations in B given by Veri and unknown to Org UBC 29/7/03

  22. Link… • Veri, given access to S, can verify that B corresponds toS • An attack by I requires hacking the compiler • An attack by Org requires knowing the locations of watermarks UBC 29/7/03

  23. Example C restricts her Employee data from participating in a join with her Payroll data RecordPayroll : Set := mkPay{PID : nat; JoinInd : bool; Position : string; Salary: nat}. RecordEmployee : Set := mkEmp{Name : string; EID : nat; …}. RecordCombined : Set := mkComb{CID : nat; CName : string; Csalary: nat; …}. UBC 29/7/03

  24. FixpointJoin [Ps: list Payroll]: (list Employee)  (list Combined) := [Es : list Employee] CasesPs of nil(nil Combined) | (cons p ps) (app (check_JoinInd_and_find_employee_recordp Es) (Join ps Es)) end. (check_JoinInd_and_find_employee_recordp Es) if a record is found in Es whose EID matches Ps PID and JoinInd permits Join, then a list of length 1 with the result of Join is returned, otherwise empty UBC 29/7/03

  25. Definition Pc:= [S:((list Payroll)(list Employee) (list Combined))  Prop]  Ps:list Payroll}.  Es:list Employee. (UniqueJoinInd Ps)   P: Payroll.(In P Ps)  ((JoinInd P)=false  not  C:Combined ((In C (S Ps Es))  ((CID C)=(PID P))) • PC(S) is written as (PC Join): Coq expands the definition of PC and provides the theorem • request to proof checking operator of Coq will check this proof i.e; it will check that the user permissions are encoded into the Join program given • Whole proof: 300 lines of Coq code; proof checking: 1 sec on a 600MHz machine UBC 29/7/03

More Related