html5-img
1 / 28

Data Management for Longitudinal Data

Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, 20.7.04 Data and Variable Management Paul Lambert. Data Management for Longitudinal Data. The nature of ‘large and complex’ longitudinal resources: complicating the variable by case matrix.

kenna
Download Presentation

Data Management for Longitudinal Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Longitudinal Studies Seminars: Longitudinal Analyses Using STATAStirling University, 20.7.04Data and Variable ManagementPaul Lambert 20.7.04: LSS

  2. Data Management for Longitudinal Data 20.7.04: LSS

  3. The nature of ‘large and complex’ longitudinal resources: complicating the variable by case matrix 20.7.04: LSS

  4. Large and complex =  Complexity in: • Multiple hierarchies of measurement • Array of variables / operationalisations • Relations between / subgroups of cases • Multiple points of measurement • Balanced or unbalanced repeated contacts • Censored duration data • Sample collection and weighting 20.7.04: LSS

  5. i) Multiple hierarchies (levels) of measurement • Common examples: • Both individuals and households • Schools and pupils • People and local districts and regions • Solutions: • Separate VxC matrix for each level, eg BHPS • Merged VxC matrix at lowest level 20.7.04: LSS

  6. Illustration: Hierarchical dataset

  7. ii) Array of variables • Vast number of variable responses, eg 1K+ • Recoding multiplies these up, eg dummies • Multiple response var.s (‘all that apply’) • Categorisations / indexes (eg occupations) • Implication: • Either separate files for separate var. groups • Or very long and difficult files… 20.7.04: LSS

  8. iii) Relations between cases • All respondents in a household • Husbands and wives both sampled • Fellow school pupils sampled • Longitudinal: differing relations with others at different times • Outcomes: • Link information between related cases 20.7.04: LSS

  9. iv) Multiple measurement points • Longitudinal: information on same cases for multiple time points • Panel or cohort: several records via repeated contact for each individual • Problems of ‘unbalanced’ panels • Life history / retrospective: • Durations in spells: multistate / multiepisode, overlapping spells; time varying covariates • Left or right censoring of durations in spells 20.7.04: LSS

  10. v) Sample collection / weighting • Multistage cluster particularly popular • Sample may have been clustered, stratified • Longitudinal: uneven inclusion of cases over time • Sample weights designed to solve, but: • Complex in application • Not suited to all applications 20.7.04: LSS

  11. Data Management for Longitudinal Data 20.7.04: LSS

  12. STATA data management examples: see datmanag_part1.do Claim: For data management, STATA is powerful, but not always well designed • Batch files / interactive syntax / programs • Data entry / browsing • Variable labels • Computing / recoding • Missing values • Weighting data • Survey estimators (svy) 20.7.04: LSS

  13. Data Management for Longitudinal Data 20.7.04: LSS

  14. Typology of longitudinal data files • 3 Sets of contrasts : • Repeated X-section / Panel / Cohort Event History / Time Series • Wide v’s Long • Discrete v’s Continuous time See datmanag_part 2.do 20.7.04: LSS

  15. Contrast 1 Type A: Repeated x-sect data 20.7.04: LSS

  16. C1 Type B: Panel dataset (Unbalanced) 20.7.04: LSS

  17. C1 Type C : Event history data analysis • Alternative data sources: • Panel / cohort (more reliable) • Retrospective (cheaper, but recall errors) • Aka: ‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’; .. Focus shifts to length of time in a ‘state’ - analyses determinants of time in state 20.7.04: LSS

  18. Key to event histories is ‘state space’ 20.7.04: LSS

  19. C1 Type D: Time series data **Exact equivalence to panel data format Examples: • Unemployment rates by year in UK • University entrance rates by year by country Statistical summary of one particular concept, collected at repeated time points from one or more subjects 20.7.04: LSS

  20. Contrast 2: ‘Wide’ versus ‘Long’ format Relevant to all types of dataset: • ‘Wide’ = 1 case per record (person), additional vars for time points : Person 1 Sex YoB Var1_92 Var1_93 Var1_94 … Person 2 … • ‘Long’ = 1 case per time point within person (as panel data example) • STATA: ‘reshape’ command allows transfer between the two formats 20.7.04: LSS

  21. Contrast 3: Continuous v’s Discrete time Primarily in terms of event history datasets • Continuous time (‘spell files’, ‘event oriented’) • One episode per case, time in case is a variable • Discrete time • One episode per time unit, type of event and event occurrence as variables • Analyses: Most packages can handle either format comfortably 20.7.04: LSS

  22. 20.7.04: LSS

  23. 20.7.04: LSS

  24. Data Management for Longitudinal Data 20.7.04: LSS

  25. Matching files • Complex data inevitably involves more than one related data file • A vital data analysis skill!! • Link data between files by connecting them according to key linking variable(s) • Eg, ‘person identifier’ variable ‘pid’ • Eg : http://iserwww.essex.ac.uk/bhps/doc/ See datmanag_part3.do 20.7.04: LSS

  26. Types of file matching • Case-to-case matching • One-to-one link, eg two files with different sets of variables for same people • STATA: append or merge • Table distribution • One-to-many link, eg one file has individuals, another has households, and match household info to the individuals • STATA: merge 20.7.04: LSS

  27. Types of file matching ctd • Aggregating • Summarise over multiple cases then link summaries back to cases • STATA: collapse • Related cases matching • Link info from one related case to another case, eg info on spouse put on own case • STATA: merge or joinby 20.7.04: LSS

  28. STATA file matching crib: _merge = indicator of cases present for: 1 = Master file but not input file 2 = Input file but not Master file 3 = Master and input file 20.7.04: LSS

More Related