1 / 17

Mining Multiple Private Databases

Top k Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu (GA Tech) Presented by: Cesar Gutierrez. Mining Multiple Private Databases. About Me. ISYE Senior and CS minor Graduating December, 2008

mike_john
Download Presentation

Mining Multiple Private Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu (GA Tech) Presented by: Cesar Gutierrez Mining Multiple Private Databases

  2. About Me • ISYE Senior and CS minor • Graduating December, 2008 • Humanitarian Logistics and/or Supply Chain • Originally from Lima, Peru • Travel, paintball and politics

  3. Outline • Intro. & Motivation • Problem Definition • Important Concepts & Examples • Private Algorithm • Conclusion

  4. Introduction • ↓ of information-sharing restrictions due to technology • ↑ need for distributed data-mining tools that preserve privacy • Trade-off Accuracy Efficiency Privacy

  5. Motivating Scenarios • CDC needs to study insurance data to detect disease outbreaks • Disease incidents • Disease seriousness • Patient Background • Legal/Commercial Problems prevent release of policy holder's information

  6. Motivating Scenarios (cont'd) • Industrial trade group collaboration • Useful pattern: "manufacturing using chemical supplies from supplier X have high failure rates" • Trade secret: "manufacturing process Y gives low failure rate"

  7. Problem & Assumptions • Model: n nodes, horizontal partitioning • Assume Semi-honesty: • Nodes follow specified protocol • Nodes attempt to learn additional information about other nodes ...

  8. Challenges • Why not use a Trusted Third Party (TTP)? • Difficult to find one that is trusted • Increased danger from single point of compromise • Why not use secure multi-party computation techniques? • High communication overhead • Feasible for small inputs only

  9. Recall Our 3-D Goal Accuracy Efficiency Privacy

  10. Private Max • Actual Data sent on first pass • Static Starting Point Known start 30 2 1 30 10 40 30 40 20 4 3 40

  11. Multi-Round Max • Randomly perturbed data passed to successor during multiple passes • No successor can determine actual data from it's predecessor • Randomized Starting Point Start 18 32 35 0 D2 D2 30 10 32 35 40 18 32 35 20 40 D4 D3 32 35 40

  12. Evaluation Parameters • Large k = "avoid information leaks" • Large d = more randomization = more privacy • Small d = more accurate (deterministic) • Large r = "as accurate as ordinary classifier"

  13. Accuracy Results

  14. Varying Rounds

  15. Privacy Results

  16. Conclusion • Problems Tackled • Preserving efficiency and accuracy while introducing provable privacy to the system • Improving a naive protocol • Reducing privacy risk in an efficient manner

  17. Critique • Dependency on other research papers in order to obtain a full understanding • Few/No Illustrations • A real life example would have created a better understanding of the charts

More Related