On method specific record linkage for risk assessment
Download
1 / 30

On method-specific record linkage for risk assessment - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

On method-specific record linkage for risk assessment. Jordi Nin Javier Herranz Vicenç Torra. On method-specific record linkage for risk assessment Contents. Disclosure Risk Scenario: How an intruder re-identifies an individual Preliminaries : Protection methods and Record Linkage

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' On method-specific record linkage for risk assessment' - colum


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
On method specific record linkage for risk assessment

On method-specific record linkage for risk assessment

Jordi Nin

Javier Herranz

Vicenç Torra


On method specific record linkage for risk assessment contents
On method-specific record linkage for risk assessmentContents

  • Disclosure Risk Scenario:

    How an intruder re-identifies an individual

  • Preliminaries:

    Protection methods and Record Linkage

  • Location record linkage:

    A new way to compute the disclosure risk

  • Conclusions and future work:


Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work


On method specific record linkage for risk assessment disclosure risk scenario
On method-specific record linkage for risk assessment Disclosure Risk Scenario

Attribute classification

a

Identifiers: Passport number

n

Quasi-Identifiers: Age, postal code

Confidential: Income

X


On method specific record linkage for risk assessment disclosure risk scenario1

X’ = X’nc || Xc

On method-specific record linkage for risk assessment Disclosure Risk Scenario

Re-identification scenario

X = id || Xnc || Xc

Privacy is ensured, quasi-identifiers are anonymized

Data quality is preserved, confidential attributes are preserved


On method specific record linkage for risk assessment disclosure risk scenario2

Problem: Find a correct mapping between data file 1 and data file 2

On method-specific record linkage for risk assessment Disclosure Risk Scenario

Record Linkage

Data set 1

Data set 2

X1 X2 X3 X4

X’1 X’2 X’3 X’4

X1 X2 X3 X4

X’1 X’2 X’3 X’4

X’1 X’2 X’3 X’4

X1 X2 X3 X4


On method specific record linkage for risk assessment disclosure risk scenario3
On method-specific record linkage for risk assessment Disclosure Risk Scenario

Distance based Record linkage

Probabilistic Record linkage

  • The nearest pairs of record are considered as linked pairs

  • It is very easy to tune

  • Results very dependent of the parameters

  • Moderated time cost

  • Linked pairs are computed using conditional probabilities

  • Tuning is difficult

  • Few parameters

  • High time cost


Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work


On method specific record linkage for risk assessment preliminaries
On method-specific record linkage for risk assessment Preliminaries

Rank swapping - p

Algorithm

For all attrj where 1 j  n

Attrj is sorted

all values xij are swapped with xil where i < l  l+p

Sorting Attrj is reversed

End for

End algorithm

Simple

Preserve µ and 

All combinations disappear


On method specific record linkage for risk assessment preliminaries1
On method-specific record linkage for risk assessment Preliminaries

Rank swapping - p example

p = 20%

1

2

3

4

5

6

7

8

9

10

8

6

10

7

9

2

1

4

5

3


On method specific record linkage for risk assessment preliminaries2

k=3

On method-specific record linkage for risk assessment Preliminaries

Microaggregation - k

a

a

a

a

k

k

k

k

a = 1  Optimal

a > 1, NP-Hard  Heuristic


On method specific record linkage for risk assessment preliminaries3

x1

x2

k = 2

x4

x3

On method-specific record linkage for risk assessment Preliminaries

Optimal univariate Microaggregation

Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist)

Result 2. All clusters of any optimal partition have between k and 2k-1 elements.

Clusters are built using the nodes of the shortest path algorithm


On method specific record linkage for risk assessment preliminaries4
On method-specific record linkage for risk assessment Preliminaries

MDAV Microaggregation

k=2

X

X’

MDAV is multivariate heuristic microaggegation


On method specific record linkage for risk assessment preliminaries5
On method-specific record linkage for risk assessment Preliminaries

Score: Protection method evaluation

Score = 0.5 IL + 0.5 DR

DR = 0.25 DLD+0.25 PLD+0.5 ID

IL = 100(0.2 IL1+0.2 IL2+0.2 IL3+0.2 IL4+0.2 IL5)

IL1 = mean of absolute error

DLD = number of links using DBRL

IL2 = mean variation of average

PLD = number of links using PRL

IL3 = mean variation of variance

ID = protected values near orginal

IL4 = mean variation of covariancie

IL5 = mean variation of correlation


Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work


On method specific record linkage for risk assessment location problem desciption

It is unnecessary to compare all the records

On method-specific record linkage for risk assessment Location Problem Desciption

L-RL: Location Record Linkage

Standard record linkage compares all records

Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set


On method specific record linkage for risk assessment location record linkage
On method-specific record linkage for risk assessment Location record linkage

Method Description

Xext

X’


On method specific record linkage for risk assessment location record linkage1

Distance

17

6

13

14

16

19

12

5

16

On method-specific record linkage for risk assessment Location record linkage

Example: Rank swapping

P=20%


On method specific record linkage for risk assessment location record linkage2
On method-specific record linkage for risk assessment Location record linkage

Rank Swapping Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Rank swapping configurations:

p = 2 … 20

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID


On method specific record linkage for risk assessment location record linkage3
On method-specific record linkage for risk assessment Location record linkage

L-RL: Rank Swapping Linkage Results


On method specific record linkage for risk assessment location record linkage4
On method-specific record linkage for risk assessment Location record linkage

L-RL: Rank Swapping Score Results


On method specific record linkage for risk assessment location record linkage5
On method-specific record linkage for risk assessment Location record linkage

Univariate Microaggregation Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Univariate microaggregation configurations:

k = 10 … 50

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID


On method specific record linkage for risk assessment location record linkage6
On method-specific record linkage for risk assessment Location record linkage

L-RL: Univariate Microaggregation Linkage Results


On method specific record linkage for risk assessment location record linkage7
On method-specific record linkage for risk assessment Location record linkage

L-RL: Univariate Microaggregation Score Results


On method specific record linkage for risk assessment location record linkage8
On method-specific record linkage for risk assessment Location record linkage

MDAV Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Univariate microaggregation configurations:

k = 10 … 50

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID


On method specific record linkage for risk assessment location record linkage9
On method-specific record linkage for risk assessment Location record linkage

L-RL: MDAV Linkage Results


On method specific record linkage for risk assessment location record linkage10
On method-specific record linkage for risk assessment Location record linkage

L-RL: MDAV Score Results


Disclosure Risk Scenario

Preliminaries

Location Problem Description

Location Record Linkage

Conclusions and future work


On method specific record linkage for risk assessment conclusions and future work
On method-specific record linkage for risk assessment Conclusions and future work

Conclusions

  • We have presented a new type of record linkage designed to exploit the limitations of some protection methods

  • L-RL method obtains a more accurate DR evaluation for rank swapping and univariate microaggregation

  • MDAV is immune to the location problem

Future work

  • We plan to study the DR of MDAV and other protection methods using other ad-hoc methods


On method specific record linkage for risk assessment1

On method-specific record linkage for risk assessment

Jordi Nin

Javier Herranz

Vicenç Torra


ad