1 / 29

Research Databases for NRES

Research Databases for NRES. London 29 th Feb 2012. JHC roles. Research chair at UoN –epidemiology, risk prediction and drug safety Member of the ECC NIGB Developed and run the not-for-profit QResearch database with EMIS Inner city GP. Outline. Background Key ethical issues

hart
Download Presentation

Research Databases for NRES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Databases for NRES London 29th Feb 2012

  2. JHC roles • Research chair at UoN –epidemiology, risk prediction and drug safety • Member of the ECC NIGB • Developed and run the not-for-profit QResearch database with EMIS • Inner city GP

  3. Outline Background Key ethical issues Scientific Confidentiality Example of QResearch Data linkage and pseudonymisation Discussion /questions

  4. Background • Large volumes of electronic data now collected in the NHS • Huge potential for useful research • Technology exists to extract data and assemble it into databases • Databases popular with academics and DH • Large numbers for studies • Relative efficiency • Increasing potential for data linkages

  5. Definition research database in NRES SOP • “a structured collection of individual level personal information, which is stored for potential research purposes beyond tehh life of a specific research project with defined end points” • Includes databases set up for research • Re-use of databases established for • - audit • - disease registers

  6. Research databases • Included in NRES SOPs • Specific section within IRAS form • Approvals generally for 5 years renewable • Can include generic approval • Can include providing data to third parties as part of a research service • Detailed protocol required on purpose, operation, methods, policies, governance

  7. New research databases • What is the purpose? • Do we need a new one or can an existing database be used? • Who is will ‘own’ it and be responsible for it? • What data will it contain and how will it be accessed? • What is the governance framework ? • Will it contain identifiable data +/- consent? • ? S251 support required

  8. Key objectives for safe data sharing Maximise public benefit Patient and their data Minimise risk Privacy Maintain public trust

  9. Three main options for data access s251 Maximise public benefit consent Pseudo nymisation Patient and their data Minimise risk Privacy Maintain public trust

  10. De-identification • Various methods to reduce identifiability of data • Pseudonymisation • Use of samples and limited data items rather than whole database • Conversion of dob to year of birth or age. • Contracts/data sharing agreements with clear liabilities and penalities

  11. Example for QResearch • Established in 2002 to support ethical medical research • Largest of three UK databases & expanding • Management board – UoN and EMIS. • Advisory board – professional and lay representation. Advises on policy, strategy etc. • Scientific Board – review science and risk assessment.

  12. QResearch key facts • Large pseudonymised database • >700 GP practices, 14 million patients • Patient and event level data • Demographics – year birth, sex, ethnicity • Diagnoses, Lab results, clinical values • Medication , referrals • No free text. No strong identifiers • All research peer reviewed & published.

  13. QResearch uploads • informed consent from practice • Practice displays notice in waiting room • Practice activates upload software • Data pseudonymised BEFORE data leaves practice • Patients can be opted out of upload • Secure upload to server at EMIS with full NHS security clearance • Backups delivered to University

  14. QResearch - security • Full database stored on off line server • Full encryption of hard drive • Key padded server room with limited access • 24 hour CCTV with monitoring • Confidentiality clauses in staff contracts • Full log of all data accesses • Log of all uses of data • No losses data or breaches in 10 years

  15. QResearch policy • Whilst all data are pseudonymised, we have same safeguard as it identifiable • To minimise any risks of re-identification patients (and practices) • To maintain public and professional trust • Explicit policy to ensure all results of research studies are widely and freely available for public benefit.

  16. Researcher access • University based academics • One must be GMC registered • Standard application form • Clarify research question and methods • Independent Scientific review • Provided with sample size and data items needed to answer question • Data only used for agreed purpose • Data destroyed after project completed

  17. Why is it important to ensure robust scientific methods • Published research must give valid results which don’t mislead or misinform doctors, patients, policy makers • Equally need to avoid unpublished research – eg a good study with important results • Avoid duplication effort • Avoid publication bias • Avoid suppression of unpopular results (eg side effects medicines)

  18. Ensuring scientific quality • Is there a clear research question? • Can the data answer the question? • Are the methods scientifically valid? • Are the results likely to be generalisable? • Does team have skills to do the project • Is the researcher free to publish? • Some databases with generic REC agreement will organise independent scientific review to answer the above.

  19. Risk to confidentialty • Each study needs risk assessment even if pseudonymised • Could the study lead to identification of the patients because of • - other data that the researcher might have • - small numbers/rare events • Minimise risk by de-identification data • Data sharing agreement & sanctions for misconduct .

  20. QResearch data linkage study • Linked to deprivation in 2002 • Linked to ONS cause death in 2007 • Currently being linked to HES and cancer registry • Testing out new method of data linkage using pseudonymised data linkage • Exceptionally high levels of valid, complete NHS numbers for ONS data, HES, GP data

  21. Open pseudonymiser project • Need approach which doesn’t extract identifiable data but still allows linkage • Legal ethical and NIGB approvals • Secure, Scalable • Reliable, Affordable • Generates ID which are Unique to project • Can be used by any set of organisations wishing to share data • Pseudoymisation applied as close as possible to identifiable data ie within clinical systems

  22. Pseudonymisation: method • Scrambles NHS number BEFORE extraction from clinical system • Takes NHS number + project specific encrypted ‘salt code’ • One way hashing algorithm (SHA2-256) – no collisions and US standard from 2010 • Applied twice - before leaving clinical system & on receipt by next organisation • Apply identical software to second dataset • Allows two pseudonymised datasets to be linked • Cant be reversed engineered

  23. Web tool to create encrypted salt: proof of concept • Web site private key used to encrypt user defined project specific salt • Encrypted salt distributed to relevant data supplier with identifiable data • Public key in supplier’s software to decrypt salt at run time and concatenate to NHS number (or equivalent) • Hash then applied • Resulting ID then unique to patient within project

  24. Openpseudonymiser.org • Website • Desktop application • Software for integration • Test data • Documentation • Utility to generate encrypted salt codes • Source code GNU GPL

  25. Progress so far • Pseudonymised entired • HES database since 1997 • Cause of death data since 1993 • Cancer registrations since 1990 • Linked all three datasets based only on pseudo NHS number - >99% complete • Due to linked GP data Spring 2012 • Implementing into major GP computer systems

  26. Key points • Pseudonymisation at source • Instead of extracting identifiers and storing lookup tables/keys centrally, then technology to generate key is stored within the clinical systems • Use of project specific encrypted salted hash ensures secure sets of ID unique to project • Full control of data controller • Can work in addition to existing approaches • Open source technology so transparent & free

  27. Definition of clinical care team • Important as determines whether s251 required • Tendency by research community to adopt v broad definition to justify access • Definition is tricky as a guide • Individual has a duty of care to patient • Has duty of confidence • Would be recognised in that role by a reasonable patient

  28. Implications of Open Data • VERY difficult to see how patient level data can be suitably de-identified so that it can published on line to meet Cameron’s promises • Current work on de-identification standard by IC/DH to help custodians decided when data can be published.

More Related