Development of UK Virtual Microdata Laboratory • Felix Ritchie • Shanghai, March 2010
Plan of presentation • Starting principles • What we did, and the impact • New things we had to develop • security model, researcher management, SDC • What we’ve learnt • what matters, what doesn’t, what we’d do differently • Future directions
Starting principles • Designed by researchers for research • maximum access, limited by law • Expandable • Secure at reasonable cost • Manageable at reasonable cost • Distribute access, not data
Distributed access • Why is this good? • Data always under ONS control • Live monitoring • Simpler, but safer, disclosure control • How does this work in practice? • VML accessible from all ONS computers • Access points in govt. offices in Glasgow and Belfast • Plan to roll-out to more govt offices in 2010 • VML-duplicate set up on academic network • VML set to become exception rather than default data store
What we did • Central data repository and processors • Access via secured thin clients • Work space partitioned by dataset, not usage • researchers get access to dataset, not variables • No access to internet or rest of network • Same system for internal and external users
What we did - outcomes • 30%-50% growth every year • Massive increase in microeconomic analysis • Form almost no firm-level studies to European leaders • Keystone of ONS Administrative Data Project • Total cost ~£350,000 per year • strategy 17%, fixed ops 65% variable ops 18% • income ~£50,000
New things developed (1)The VML Security Model • valid statistical purpose • trusted researchers • anonymisation of data • technical controls around data • disclosure control of results safe projects + safe people + safe data + safe setting + safe outputs safe use
New things developed (2)Output statistical disclosure control • ‘Standard’ SDC not appropriate • traditional rules not appropriate for research environments • SDC on data or methods pointless • Principles-based output SDC • SDC at the point of release • trained researchers • trained staff • agreement on principles and purpose • safe vs unsafe outputs, based on functional form
New things developed (3)Active researcher management • Need to develop shared objectives with researchers • Principles-based SDC needs buy-in from researchers • Reduced management costs • Compulsory training • SDC • VML objectives and constraints • legal and procedural background
What we’ve learnt (1)Things that matter • attitude to researchers • model of SDC • broad scale of operations • including future plans • scale of coherent networks • (for remote access) • eg ONS internal network, Government Secure Intranet, University Intranet, VPN?
What we’ve learnt (2)Things that don’t matter • Location of servers and users • Type of users • Type of data • IT • Metadata • Specific legal/procedural framework?
What we’ve learnt (3)Things we would do differently • Prepare ONS for expansion • senior buy-in • IT planning • better data management • better user management • better metadata
Future directions • Expansion across the government network • Supporting academic equivalent • VML facing massive internal increase in use • Developing international standards • Better communication • wikis, FAQs, common metadata system • metadata • Not being considered • remote job systems • synthetic data
Questions? Felix Ritchie email@example.com Microdata Analysis and User Support firstname.lastname@example.org
The data model (1) • ‘Spectrum’ of access points balancing • value of data • ease of use • disclosure risk • for a given level of confidentiality, maximise data use and convenience • no ‘one-size-fits-all’ solution • no absolute prohibitions • trade-off is made explicit • users determine appropriate level of access