Mining multiple private databases
Download
1 / 17

Mining Multiple Private Databases - PowerPoint PPT Presentation


  • 170 Views
  • Updated On :

Top k Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu (GA Tech) Presented by: Cesar Gutierrez. Mining Multiple Private Databases. About Me. ISYE Senior and CS minor Graduating December, 2008

Related searches for Mining Multiple Private Databases

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Mining Multiple Private Databases' - mike_john


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Mining multiple private databases l.jpg

Topk Queries Across Multiple Private Databases (2005)

Li Xiong (Emory University)

Subramanyam Chitti (GA Tech)

Ling Liu (GA Tech)

Presented by: Cesar Gutierrez

Mining Multiple Private Databases


About me l.jpg
About Me

  • ISYE Senior and CS minor

  • Graduating December, 2008

  • Humanitarian Logistics and/or Supply Chain

  • Originally from Lima, Peru

  • Travel, paintball and politics


Outline l.jpg
Outline

  • Intro. & Motivation

  • Problem Definition

  • Important Concepts & Examples

  • Private Algorithm

  • Conclusion


Introduction l.jpg
Introduction

  • ↓ of information-sharing restrictions due to technology

  • ↑ need for distributed data-mining tools that preserve privacy

  • Trade-off

Accuracy

Efficiency

Privacy


Motivating scenarios l.jpg
Motivating Scenarios

  • CDC needs to study insurance data to detect disease outbreaks

    • Disease incidents

    • Disease seriousness

    • Patient Background

  • Legal/Commercial Problems prevent release of policy holder's information


Motivating scenarios cont d l.jpg
Motivating Scenarios (cont'd)

  • Industrial trade group collaboration

    • Useful pattern: "manufacturing using chemical supplies from supplier X have high failure rates"

    • Trade secret: "manufacturing process Y gives low failure rate"


Problem assumptions l.jpg
Problem & Assumptions

  • Model: n nodes, horizontal partitioning

  • Assume Semi-honesty:

    • Nodes follow specified protocol

    • Nodes attempt to learn additional information about other nodes

...


Challenges l.jpg
Challenges

  • Why not use a Trusted Third Party (TTP)?

    • Difficult to find one that is trusted

    • Increased danger from single point of compromise

  • Why not use secure multi-party computation techniques?

    • High communication overhead

    • Feasible for small inputs only


Recall our 3 d goal l.jpg
Recall Our 3-D Goal

Accuracy

Efficiency

Privacy


Private max l.jpg
Private Max

  • Actual Data sent on first pass

  • Static Starting Point Known

start

30

2

1

30

10

40

30

40

20

4

3

40


Multi round max l.jpg
Multi-Round Max

  • Randomly perturbed data passed to successor during multiple passes

  • No successor can determine actual data from it's predecessor

  • Randomized Starting Point

Start

18

32

35

0

D2

D2

30

10

32

35

40

18

32

35

20

40

D4

D3

32

35

40


Evaluation parameters l.jpg
Evaluation Parameters

  • Large k = "avoid information leaks"

  • Large d = more randomization = more privacy

  • Small d = more accurate (deterministic)

  • Large r = "as accurate as ordinary classifier"





Conclusion l.jpg
Conclusion

  • Problems Tackled

    • Preserving efficiency and accuracy while introducing provable privacy to the system

    • Improving a naive protocol

    • Reducing privacy risk in an efficient manner


Critique l.jpg
Critique

  • Dependency on other research papers in order to obtain a full understanding

  • Few/No Illustrations

  • A real life example would have created a better understanding of the charts


ad