1 / 14

Risk management and the release of microdata :

Risk management and the release of microdata :. balancing disclosure risks and data utility. Sonia Whiteley & Eric Skuja The Social Research Centre. About the Social Research Centre (1).

Download Presentation

Risk management and the release of microdata :

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Risk management and the release of microdata: balancing disclosure risks and data utility Sonia Whiteley & Eric Skuja The Social Research Centre

  2. About the Social Research Centre (1) We are a private, for profit company owned by ANU Enterprise, a subsidiary of the Australian National University and co-founder of the Australian Centre for Applied Social Research Methods (AusCen). Our resources include 60 professional staff, 125 station call centre, a panel of 250 interviewing staff and qualitative interviewing facilities. Typical services provided include survey design and execution (including sampling, questionnaire design, survey administration and interviewer training), qualitative research, survey data management, statistical consulting and analytical and interpretative reporting.

  3. About the Social Research Centre (2) We conduct a number of large scale surveys that contribute to the annual Report on Government Services (ROGS) which provides information on the equity, effectiveness and efficiency of government services in Australia including the: National Survey of Community Satisfaction with Policing (NSCSP) Student Outcomes Survey (SOS) Australian Early Development Census (AEDC).

  4. About the AEDC (1) • The Australian Early Development Census (AEDC) is conducted every three years for every Australian child currently enrolled in their first year of full-time school. • Approximately 100 checklist questions cover the five theoretical domains of early childhood development including: • Physical health and well being • Social competence • Emotional maturity • Language and cognitive skills (school-based), and • Communication skills and general knowledge.

  5. About the AEDC (2) The AEDC was conducted for the first time in 2009 with a second collection in 2012. Preparations are already underway for the 2015 AEDC collection. Approximately 290,000 checklists were completed for each AEDC which equates to more than 96 per cent of in-scope children. There are three school systems in Australia, Government, Catholic and Independent and all actively participate in the AEDC.

  6. Our role in the AEDC Our involvement in the AEDC is ongoing and includes: • Collecting checklist data from teachers via a secure, online system • Developing and maintaining a government website containing • AEDC resources http://www.aedc.gov.au/ and • AEDC macrodata and maps http://www.aedc.gov.au/data • Managing and disseminating the AEDC data collections.

  7. Traditional risk management of microdata • A ‘worst case’ scenario approach requires the data custodians to be responsible for identifying and mitigating all potential risks • This model assumes that data users are • Unprofessional • Cannot be trusted • Do not have the required skills or training, and • (In extreme cases) Intend to maliciously misuse the data • Data must be protected from the users and the utility of the unit record data for research is a lower priority • The product of this approach is typically a confidentailised unit record file and the confidentialisation is regarded as the primary safeguard

  8. Initial approach to AEDC data management • The AEDC Data Protocol and the AEDC Linkage Policy provide the research community with guidance regarding the appropriate uses of the data. • Two confidentialised unit record files (CURFs) were produced: • The Research CURF and • The Geography CURF • The files were split due to the • large number of demographic variables • fine level of geographic information that was available, and • concerns about the re identification and disclosure of children, classes and schools

  9. Perturbation issues • Both files were perturbed by an external agency based on a ‘worst case scenario’ view of risk • The perturbation rules were undisclosed to prevent reverse engineering • Key variables of relevance to early childhood education researchers were changed significantly • In particular, gender of the child was altered substantially in some geographic areas • Government agencies started to use the CURFs because they were smaller and more accessible files which led to discrepancies between the official results and the agency results

  10. Alternative approaches to risk management • The responsibility for appropriately using and reporting microdata is shared between the data custodian and the research community • It is assumed that researchers will observe their professional codes of conduct and do not intend to misuse the data • The main risk focus is on ensuring that researchers appropriately handle, store and publish the data. This includes: • Confirming a genuine research aim • Restriction of data access to authorised users • Ensuring the microdata is anonymised, and • Undertaking a risk assessment of files prior to release

  11. Formal risk assessments • Government departments are still extremely risk averse, especially when it comes to information about very young children. Two simple risk assessments accompany each microdata file. Both act as topics for negotiation and support rather than obstruction: • The proportion of unique records in the dataset is assessed mainly to discourage unauthorised data linkage projects being undertaken by other government agencies. • The proportion of cells in two-way tables with three or fewer children is calculated to foreshadow potential problems when researchers publish the data.

  12. High risk data Built Environment measures • Child Friendly Neighbourhoods: A composite index based on the following built environment measures requires the children’s X-Y coordinates: • Child health resources Proximity of childcare facilities. • Parks and greenness Proximity of neighbourhood parks. • Residential density Number of residential dwellings. • Home environment Type of residence. Size of backyard. • Traffic exposure Road network classifications • Crime Crime and child related offenses • Land use mix Evenness of different land uses • Public transport Accessibility of bus and rail stops • Street connectivity Number of 3-way or more intersections

  13. Managing high risk data Points dispersed within a mesh block

  14. Implications of using anonymised microdata Improves data utility but does not necessarily present higher levels of disclosure risk than a CURF. Ensures that there is ‘one version of the truth’ and that outputs produced by researchers will be consistent all data users. Ensures that access requests for any unit record file follow the same formal, detailed assessment, management and close-out procedures. Concerns about unintentional misuse of microdata need to be clearly communicated to the research community. Provides an avenue for offering training and support are a condition of data release where a potential data user may not have the skills or experience Creates a platform where all genuine data access requests can be accommodated through a combination of engaged negotiation regarding the required data elements and by offering supported (and supportive) access modalities.

More Related