Data Linkage for Educational Research Royal Statistical Society March 19th 2007. Andrew Jenkins and Rosalind Levačić Institute of Education, University of London. Examples of Data Linkage.
Andrew Jenkins and Rosalind Levačić
Institute of Education, University of London
Introduce the datasets used
Outline why data linkage was useful/important
How the datasets were combined
Any practical problems which arose in linking data
Methodological issues in using linked data
To use data from first wave of Longitudinal Survey of Young People in England (LSYPE), combined with other datasets, to try to separate out effects of family background and neighbourhood on students’ attainment.
Purpose of survey: to collect data as part of a 3.5 year evaluation of Diversity Pathfinders (2002-2006).
Six Local Authorities provided with some funding by DfES to promote collaboration between groups of secondary schools with the purpose of raising standard and promoting diversity through attaining specialist status.
Largely a qualitative study using interviews and some participant observation, supplemented by an analysis of examination performance and a survey of students’ views and experiences ‘before’ and ‘after’ three years.
Since DP was ‘pathfinding’ it was not a uniform treatment with controls.
By intention each LA developed its own approach and own way of selecting and grouping schools for collaboration within the DP project.
The research team selected 31 schools as case studies for which evidence collected by interviews.
These schools were also the ones selected for a survey.
Each school selected one mixed ability Year 11 form to respond to the survey on-line.
how did students rate aspects of their learning experience?
did students in 2005/6 rate their learning experiences better than those in 2002/3, especially with regard to increased working with students from other schools?
did students’ learning experiences differ by school and by student characteristics?
did more disadvantaged students have an improved learning experience after 3 years of DP?
NPD provides data on student’s
Obtaining data on student characteristics without needing to ask intrusive questions on the survey or extend length of questionnaire;
Did not need to use alternative of asking the school to supply the data – would add to burden of survey to schools and reduce further the response rate.
NPD consists of Pupil Level Annual Census plus test results.
Each pupil has a Unique Pupil Number (UPN) used by the school when reporting data to DfES.
We needed the schools to give us the UPNs of the students in the form doing the survey. Also DoB in case needed for matching.
UPNs are highly confidential – letter from DfES to schools requesting this.
Problem: getting UPNs out of each school. 28 schools responded in 2002/3: only 16 in 2005/6.
Each pupil given a DP project identifier number which was attached to a questionnaire.
At school pupil used id number to download own questionnaire.
NPD uses matching pupil reference number.
We sent UPNs to DfES and they matched with pupil reference number and sent us matched NPD data for these students.
from schools that do not supply UPNs
due to non matching of UPNs and PMRs
due to missing data in NPD.
Raises questions about how representative the data are.
Inconsistent data between DP survey and NPD- gender in some cases.
Able to compare students from two waves of the survey
Able to control for pupil characteristics in analysis of questionnaire responses when comparing years or schools.
Able to address research questions on relationship between pupils’ characteristics and experience of school.