1 / 16

Protecting Confidentiality

Protecting Confidentiality. in a Virtual Data Centre. Computational Informatics. Christine O’Keefe , Mark Westcott, Adrien Ickowicz, Maree O’Sullivan, CSIRO Tim Churches, Sax Institute. 28 October 2012. Overview. Introduction to the problem Virtual Data Centres Proposed solution.

rue
Download Presentation

Protecting Confidentiality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protecting Confidentiality in a Virtual Data Centre Computational Informatics Christine O’Keefe , Mark Westcott, Adrien Ickowicz, Maree O’Sullivan, CSIRO Tim Churches, Sax Institute 28 October 2012

  2. Overview • Introduction to the problem • Virtual Data Centres • Proposed solution Confidentiality in Virtual Data Centres | Christine O’Keefe

  3. Population Health Research Network* • Provides access to linkable de-identified health data for research • Improving outcomes • Improving policy • Traditionally • Supplies linkable de-identified health data directly to researchers • Loss of control over data heightens risk of: • External attack on datasets • Accidental or inadvertent actions by researcher • Deliberate attack by trusted researcher *www.phrn.org.au Confidentiality in Virtual Data Centres | Christine O’Keefe

  4. Secure Unified Research Environment* • Secure remote access to virtual workstations and network in a data centre *Sax Institute SURE User Guide v1.2 Confidentiality in Virtual Data Centres | Christine O’Keefe

  5. Confidentiality Protection for Health Data • Governance • Comply with privacy legislation and regulation • Honour assurances to data providers • Restrict access to approved researchers • Information security measures • Restrict amount and detail of data available • Apply statistical disclosure control methods before releasing data to researcher • No further confidentiality measures • Enable access via secure on-line system • Manual checking for confidentiality issues in statistical analysis outputs • “…developing valid output checking processes that are automated is an open research question” (Duncan, Elliot, Salazar-González 2012) Confidentiality in Virtual Data Centres | Christine O’Keefe

  6. Conceptual Model for online access VDC • Remote Analysis • Researcher cannot see data itself, only “Output for publication” • Virtual Data Centre • Researcher authorised to see data and “Output” as well as “Output for publication” RA Confidentiality in Virtual Data Centres | Christine O’Keefe

  7. Virtual Data Centre • Assumptions • Custodian prepares data to comply with legislation, regulation and assurances • Researcher complies with applicable researcher agreements • Researcher authorised to see data itself • Do not need to protect dataset records from researcher • Do not need to protect against malicious attacks by researcher • Data transformations and analyses are unrestricted • Confidentiality issues with respect to readers of academic literature • Confidentiality issues with repect to outputs of genuine queries Confidentiality in Virtual Data Centres | Christine O’Keefe

  8. Main Disclosure Risks in Statistical Output • Individual values • Small cells/samples … threshold • Dominance • Differencing • Linear or other algebraic relationships in data • Precision Confidentiality in Virtual Data Centres | Christine O’Keefe

  9. Confidentiality Protection in a Virtual Data Centre – two stage process • Dataset preparation - by Custodian • Confidentialisation of statistical analysis output for publication – by Researcher 2 1 • Similarities to: • ESSNet SDC Guidelinesfor checking output based on microdata research … Hundepool, Domingo-Ferrer, Franconi, Giessing, Nordholt, Spicer, de Wolf 2012 • Statistics New Zealand Data Lab Output Guide Confidentiality in Virtual Data Centres | Christine O’Keefe

  10. Dataset preparation – by Custodian • Custodian • Removes obvious identifiers • Ensures dataset has sufficient records • Ensures published datasets differ by sufficiently many records • Ensures variables and combinations of variables have suff many records • Reduces detail in data using aggregation (esp dates, locations) • Other measures as needed – statistical disclosure control 1 Confidentiality in Virtual Data Centres | Christine O’Keefe

  11. Confidentialisation of statistical analysis output for publication – by Researcher • Researcher • uses Checklist of tests to identify outputs that fail one or more tests • considers context and interations of outputs to identify potential disclosure risks • applies treatments from Checklist to reduce potential disclosure risk Confidentiality in Virtual Data Centres | Christine O’Keefe

  12. Checklist of Tests • Individual value: an individual data value is directly revealed • Threshold n: A cell or statistic is calculated on fewer than n data values • Threshold p%: A cell contains more than p% of the values in a table margin • Dominance (n,k): Amongst the records used to calculate a cell value or statistic, the n largest account for at least k% of the value • Dominance p%: Amongst the records used to calculate a cell value or statistic, the total minus the two largest values is less than p% of the largest value • Differencing: A statistic is calculated on populations that differ in fewer than n records • Relationships: The statistic involves linear or other algebraic relationships • Precision: The output involves a high level of precision in terms of significant figures and/or decimal places • Degrees of Freedom: The model output has fewer than n degrees of freedom Confidentiality in Virtual Data Centres | Christine O’Keefe

  13. Checklist - examples Confidentiality in Virtual Data Centres | Christine O’Keefe

  14. Checklist - examples Confidentiality in Virtual Data Centres | Christine O’Keefe

  15. Summary • Virtual Data Centres • Becoming more popular • Manual checking of outputs for confidentiality risk not sustainable • Automated methods for confidentiality protection in statistical analysis outputs still under development • Interim Solution • Dataset preparation by Custodian • Researchers confidentialise their own outputs for publication • Training • Checklist of tests and confidentiality treatments Confidentiality in Virtual Data Centres | Christine O’Keefe

  16. Thank you • Computational Informatics Dr Christine O’KeefeResearch Program Leader, Decision and User Science t +61 2 6216 7021 e Christine.OKeefe@csiro.au w www.csiro.au • Computational Informatics

More Related