1 / 25

The new B ank of I taly R emote access to micro D ata (BIRD)

The new B ank of I taly R emote access to micro D ata (BIRD). G. Bruno, L. D’Aurizio, R. Tartaglia-Polcini Q2008 – Rome, July 10, 2008. Motivation. Information release and data protection as competing goals The risk-utility tradeoff:. risk of data disclosure

symona
Download Presentation

The new B ank of I taly R emote access to micro D ata (BIRD)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The new Bank of Italy Remote access to micro Data (BIRD) G. Bruno, L. D’Aurizio, R. Tartaglia-Polcini Q2008 – Rome, July 10, 2008

  2. Motivation • Information release and data protection as competing goals • The risk-utility tradeoff: risk of data disclosure utility of widespread availability of data for research

  3. Motivation GOALS (UTILITY): • satisfy growing demand from external researchers for business data • improve the accountability of the Central Bank as economic research centre • provide a service to the scientific community CONSTRAINTS (RISK): • Data confidentiality must be guaranteed: • as a prerequisite for respondents’ collaboration • to foster quality of the data provided • is required by the law • Public Use File (PUF) with individual data judged unfeasible: anonymisation very problematic with business data

  4. Motivation SYNTHETIC DATA LIMITATIONS: • Identity disclosure impossible in principle, but, particularly with extreme values, it may be possible to re-identify a source record • Attribute disclosure may happen • Ample literature on data confounding and synthetic data (Duncan & Lambert 1989; Rubin 1993; Little 1993; Fuller 1993; Fienberg et al. 1996; Kennickell 1997; Abowd & Woodcock 2001; Reiter 2002; Raghunathan et al. 2003; etc.)

  5. Choices • Data confounding: create a PUF containing perturbed data to prevent identification of individual information. Downside: results (esp. regressions) may heavily depend on the confounding technique adopted - controversial literature • Data lab (à la Istat: ADELE) – the researcher has to go to the lab in person. • Remote processing, using internet, without direct access to individual data (à la Luxembourg Income Study: LISSY)

  6. Other remote processing systems • Luxembourg Income Study (LISSY, 1987) • Statistics Canada (2001) • Statistic Denmark (2001) • Statistic Netherlands (2002) • Australian Bureau of Statistics (2003) • Statistic Sweden (2003) • US Federal Agencies: NCHS (1997), NCES (1998), Census Bureau (2003)

  7. The solution adopted at the Bank of Italy BIRD • Modeled on LISSY • Low setup cost • Easily customisable • Supports multiple packages • Maximum accessibility for users • Multi-level control (user/group, dataset, keyword) • Automatic and manual checks & review

  8. How BIRD works USER ELIGIBILITY CRITERIA • Researcher status (not necessarily academic) proved by a presentation letter • Identification via valid personal id • Detailed information via form to be filled in

  9. How BIRD works USER PROFILE CREATION • The researcher indicates an e-mail address which will be recognised by the system. • The researcher indicates her own user and password • User-chosen parameters are input in the user database • Access profile is created

  10. How BIRD works SUBMISSION PROCEDURE • Communication with the processing environment via e-mail • Send a message containing user authentication info + statements to be submitted • Input message is parsed and checks are performed • If no error/security violation  submit statements • Output is parsed (automatically / manually) • If no security violation  forward to the user via e-mail

  11. Confidentiality safeguards • User level • Data level • Processing level

  12. Confidentiality safeguards User level: • Users are identified, qualified and registered • Registered mailboxes are whitelisted; ordinarily only one mailbox per user • Outputs are monitored and archived • Deontological code, privacy law, specific penalties Sanctions • Forbidden submissions or outputs are deleted • Grant of access for users trying to perform forbidden commands may be revoked • Any other sanctions or penalties required by the law where applicable

  13. Confidentiality safeguards Data level: • Extreme data are censored (Winsorized) • Identifying variables (ids, names, addresses) are expunged from the datasets used for remote processing • Stratification variables are collapsed (geographical areas and not regions; Ateco aggregations and not codes)

  14. Confidentiality safeguards Processing level: • Formally forbidden to display individual data • Keyword parserimplementedwith ceiling, blacklist e graylist • Particularly long and/or complex programmes are always reviewed manually • In the learning stage, all submissions are reviewed manually

  15. How the parser works (*) This feature will be available in the next release of the system.

  16. Datasets available STANDARD DATASET: quantitative data for the biggest firms (in terms of workforce) are censored (Winsorised) COMPLETE DATASET: no data censoring Id variables are expunged from both datasets, obviously

  17. Datasets available Aggravated procedure for accessing the complete dataset: • Access must be explicitly requested – a special profile is created • Review is exclusively manual • Wait times are longer than average as time allocated to manual review on complete dataset is reduced

  18. Documentation on the website • Application form • Instruction manual • Dataset description • Examples of submissions in the supported packages (SAS, Stata) • Methodological notes on the survey

  19. Support • Documentation available on the Bank of Italy website (manuals, variables description, questionnaires)http://www.bancaditalia.it/statistiche/indcamp/indimpser/bird • Mailbox for queries and assistance: bird_assist@bancaditalia.it

  20. An example Program submitted by the user in Stata. Authentication is in the first four lines.

  21. An example Output forwarded after review

  22. Usage of the system in the first weeks System started officially on Mar 13, 2008 Beta users from Feb 1, 2008 8 registered users 172 submissions in 21 weeks

  23. Usage of the system in the first weeks

  24. Future developments • Web submission available alongside e-mail submission • Other datasets will be made available in the future (e.g. data from the Business Outlook Survey) • Open source packages processing (e.g. R) • Merging with external datasets provided by the user, for special projects, on a discretionary basis, under an aggravated procedure and higher security levels. • Creation of closed groups with special authorisation levels for specific projects

  25. Thank you for your attention

More Related