1 / 30

Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme

Sub-brand to go here. Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme. Jane Elliott Director of the Centre for Longitudinal Studies and Director of CLOSER J.Elliott@ioe.ac.uk. Summary. A brief overview of CLOSER

jaafar
Download Presentation

Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sub-brand to go here Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal Studies and Director of CLOSER J.Elliott@ioe.ac.uk

  2. Summary • A brief overview of CLOSER • Early progress on harmonisation work packages • biological structure • Socioeconomic status and qualifications • Uniform Search Platform • Contextual database • Benefits of cross cohort analysis

  3. Cohorts and Longitudinal Studies Enhancement Resources = CLOSER • Nine Longitudinal Studies • Hertfordshire Cohort Study • 1946 British Birth Cohort • 1958 British Birth Cohort • 1970 British Birth Cohort • ALSPAC – Avon Longitudinal Study of Parents and Children • Millennium Cohort Study • Southampton Women’s Study • Life Study • Understanding Society • Funded by ESRC and MRC

  4. Objectives & timetable • Maximise the use, value and impact of data collected through a portfolio of key UK longitudinal studies • Stimulate interdisciplinary research across major longitudinal studies • Provide common resources for research • Assist with training and development • Share information and expertise between study teams • 1st October 2012 – 30th September 2017

  5. Work streams • 4 work packages on data harmonisation • 3 work packages on data linkage • Core work on • Impact – Lead by the British Library • Training and Capacity Building • Uniform Search platform • Leadership team contributing to strategic planning, sharing of best practice, funders’ strategies • See our website: www.CLOSER.ac.uk for further information • Twitter: @CLOSER_UK

  6. Leadership team WP2: Harmonisation socio-economic resources WP1: Harmonisation of biological structure and function WP7: Data linkage – health data WP5: Data linkage administrative data WP3: Harmonisation analysis of biological samples WP4: Harmonisation measures of vision WP6: Data linkage - geography 1946 cohort 1958 cohort 1970 cohort ALSPAC Impact MCS Understanding Society Metadata SWS Training and capacity building HCS Life Study Uniform Search Platform

  7. Vision for the USP • Portal to discovery of hundreds of thousands of variables, questions and data collection instruments across the nine longitudinal studies: • covering survey and biomedical data collection • promoting CLOSER harmonisation work • state-of-the-art searching tool • focus on improving visibility of associations between (currently) disparate metadata items • shared subject/topic classification • We should remember that this is massively ambitious; something that matches or surpasses the best multi-study metadata repository out there: • RAND Survey Meta Data Repository covering the HRS • family of studies: https://mmicdata.rand.org/megametadata/

  8. Why do it? Benefits to users: single resource discovery portal – replacing a fractured resource discovery landscape lowers barriers to conducting cross-cohort analysis increased visibility of cohort data and resources Benefits to data managers: standardised metadata management workflows – currently curated in isolation workflows in place for future ‘joiners’ Benefits to Principal Investigators/survey commissioners: make prospective harmonisation easier promotion and re-use of tested questions and instruments

  9. Assumptions, constraints Not a data repository Not a major software development project: major £££ is for metadata creation/enhancement DDI-L agreed as standard for metadata exchange: covers subject areas (bio and soc science) and data collection methods (‘hard’ instrument and survey) designed for marking-up longitudinal/repeated metadata items Colectica Designer selected as preferred metadata ingest/editing software

  10. Challenges • Legacy metadata: • elderly and decrepit! • not always designed for equivalence within a study, much less across studies • differing or non-existent naming conventions • substantial (manual) effort required to establish equivalences and level of equivalence • Metadata managed by five or six different units: different formats, workflows, vocabularies • Relative lack of familiarity with DDI-L: • uneven knowledge across study units

  11. Metadata: State of play • >200k variables • c.150 data collections: • CAI, PAPI, nurse visit, clinic-based protocol, biosamples, etc. • c.85 validated survey instruments • GHQ, AUDIT, Malaise Inventory, etc. • c.10 instruments used in >1 study • c.20 validated clinical measures • blood pressure, bone density, lung function, etc. • range of instruments used • c.15 cognitive or physical tests

  12. How to do it? USP will be a web interface that sits on top of a central repository fed by metadata created and delivered both by the individual study units and the CLOSER core Study units continue to curate metadata as they see fit; but not in conflict with proposed USP metadata profile Substantial metadata creation and enhancement to be undertaken by the study units: inputting historical questionnaires; mapping between data items and data collection CLOSER core responsible for identifying common (cross-study) variable and question schemes, allowing studies to reference these and also any agreed controlled vocabularies (concept, life stage etc.)

  13. Contextual database - rationale • Life course approach stresses the importance of the connection between individuals and the historical and socioeconomic context in which these individuals lived • But some research based on cohort studies pays little attention to the social, economic or historical context that helps shape the lives of individuals • Some data on social change and social context will come from the studies themselves (e.g. Breast feeding) • Aim of the contextual database is to provide a central source of key indicators over time likely to be of direct relevance to cohort research

  14. Source: Changing Britain Changing Lives : Three generations at the turn of the century Table 8.3 (Wadsworth et al)

  15. Proportion of women in paid employment, by age and cohort Source: Jenny Neuburger - Paper presented at CLS June 2008

  16. Contextual database - elements Also want to include policy narratives and a bibliography

  17. Work package 1 Biological structure and function Two years March 2013- February 2015 William Johnson & Rebecca Hardy MRC Unit for Lifelong Health and Ageing Body size and composition Cognitive performance Blood pressure Physical capability

  18. Research priority Body size - because of the obesity epidemic and the long term consequences of adiposity on health & well-being Need for harmonisation:

  19. First papers Compare body size distributions and mean trajectories, across different phases of the life course, between cohorts Investigate how SEP inequalities in body size trajectories, across different phases of the life course, differ between cohorts Li L et al. Am J Epidemiol. 2008 Howe LD et al. JECH. 2012

  20. Studies 0 1 3 5 7 0 7 8 9 10 11 12 13 15 18 0 5 10 16 26 30 34 0 7 11 16 23 33 42 44 50 0 2 4 6 7 11 15 20 26 36 43 53 60-64

  21. Data

  22. Challenges Between studies: Data covering different age ranges Data increasingly positively skewed in more recent studies Within individuals: Different number of observations at different exact ages Different precision of data Within and between individuals: Both measured and self-report data

  23. What we are aiming to achieve: 1) Demonstration research project focussing on socioeconomic differences in growth and obesity across cohorts 2) A harmonised dataset, with accompanying documentation for other users

  24. Socio-economic data harmonisation work package • Claire Crawford, Brian Dodgeon, Tim Morris, Sam Parsons, Anna Vignoles (lead) • Two years April 2013- March 2015

  25. What measures? • Measures to be harmonised are: • parental education level • cohort member level of education • socio-economic (occupation) status • household equivalised income • home ownership • Cohorts: NSHD; NCDS; BCS; ALSPAC; MCS

  26. Priority Measures agreed • Highest qualification (vocational/academic separately) held at every age • Age left full time education • Whether the person went past compulsory schooling • Average GCSE score or equivalent • GCSE Grades in mathematics and English (not for all cohorts) • For cohort member parents - age left full time education and highest qualification at birth of CM • Grandparents’ age left school

  27. Measures available by cohort

  28. The value of cross-cohort analysis • A meta-narrative of societal change over time • Creating a synthetic life course – understanding life time trajectories • Investigate cohort effects - examining the impact of different social and policy contexts • Replication of results – checking the robustness of models • Larger N and greater power • Decompose age and period effects

  29. Lifetime systolic blood pressure trajectories and velocities (predicted means) Men Women Wills et al. PLOS Med, 2011

More Related