Constructing an adolescence friendship network within the ALSPAC birth cohort using probabilistic re...
Download
1 / 41

Research Team - PowerPoint PPT Presentation


  • 130 Views
  • Uploaded on

Constructing an adolescence friendship network within the ALSPAC birth cohort using probabilistic record linkage techniques. Research Team. Simon Burgess (CMPO, Bristol) Eleanor Sanderson (CMPO, Bristol) Marcela Umaña (CMPO, Bristol) Andy Boyd (ALSPAC, Bristol). Study Rationale.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Research Team' - allene


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Constructing an adolescence friendship network within the ALSPAC birth cohort using probabilistic record linkage techniques.


Research team
Research Team ALSPAC birth cohort using probabilistic record linkage techniques.

  • Simon Burgess (CMPO, Bristol)

  • Eleanor Sanderson (CMPO, Bristol)

  • Marcela Umaña (CMPO, Bristol)

  • Andy Boyd (ALSPAC, Bristol)


Study rationale
Study Rationale ALSPAC birth cohort using probabilistic record linkage techniques.

Social Networks are ubiquitous and powerful

“The people with whom we interact… influence our beliefs, decisions and behaviours” Jackson 2010

The manner in which networks carry this influence depends in detail on the structure and characteristics of the network.


Background
Background ALSPAC birth cohort using probabilistic record linkage techniques.

  • Examples of researching Networks

    • ADD Health – Longitudinal survey of school children in the US. Questionnaire included a list of pupils in the school, respondent asked to nominate their five best male and five best female friends

    • Others based around communication networks or other defined communities


Background1
Background ALSPAC birth cohort using probabilistic record linkage techniques.

  • Advantages of studying social networks in a cohort study:

    • Extensive phenotype and genotype data

      and extensive linkage data


Background2
Background ALSPAC birth cohort using probabilistic record linkage techniques.

  • Advantages of studying social networks in ALSPAC:

    • Regional catchment area, narrow age range of participants (18 month age range, 3 school years)

  • Disadvantages:

    • Only the study participant is asked to nominate their friends


Data collection methodology
Data Collection Methodology ALSPAC birth cohort using probabilistic record linkage techniques.

  • School based (register based) method not considered feasible

    • Cost

    • School Engagement

  • Questionnaire based alternative

    • Sent to participants still in compulsory education (age ~15-16)

    • Where the participant still lived in England


Data collection methodology1
Data Collection Methodology ALSPAC birth cohort using probabilistic record linkage techniques.

  • Asked the participant to nominate their 5 best friends, in no particular order


Data collection methodology continued
Data Collection Methodology Continued… ALSPAC birth cohort using probabilistic record linkage techniques.


Linkage objectives
Linkage Objectives ALSPAC birth cohort using probabilistic record linkage techniques.

  • To identify all unique individuals from the pool of nominated friends (de-duplication)

  • To identify which of the nominated friends are also eligible to participate in ALSPAC


Before we get to linkage there s ethics
Before we get to linkage… ALSPAC birth cohort using probabilistic record linkage techniques.… there’s ethics


Before we get to linkage there s ethics1
Before we get to linkage… ALSPAC birth cohort using probabilistic record linkage techniques.… there’s ethics

  • Seeking personal identifiers of participants friends seen as contentious

  • Lawyers advised us that this is legal and within the bounds of Data Protection Act (1998)

  • Personal identifiers to be used for statistical use only and pseudonymised prior to research use


Before we get to linkage there s ethics2
Before we get to linkage… ALSPAC birth cohort using probabilistic record linkage techniques.… there’s ethics

  • Once the nominated friends have been coded the personal identifiers cannot be used again.

  • No longitudinal follow up possible on the full data set, but it is possible on those linked to ALSPAC.


The data
The Data ALSPAC birth cohort using probabilistic record linkage techniques.

  • 3,132 participants returned a questionnaire

  • 14,500 nominated friends

  • Personal Identifiers include:

    • Name, Date of Birth, School, School year, gender

  • Phenotypic data includes:

    • How they met, duration of friendship, shared interests


Data quality
Data Quality ALSPAC birth cohort using probabilistic record linkage techniques.

  • Completeness of highly distinguishing personal identifiers

    • 14,414 nominated friends >=2 identifiers

    • 12,612 nominated friends >=3 identifiers

    • 6,215 nominated friends included all four identifiers


Data quality1
Data Quality ALSPAC birth cohort using probabilistic record linkage techniques.


Data quality2
Data Quality ALSPAC birth cohort using probabilistic record linkage techniques.

  • All data reported by a participant (age ~16) about their friends

    • Some of this will be unknown or prone to greater error, particularly date of birth and non-local schools

    • Names include many spelling errors

    • Names and school details include many abbreviations and familiar names


Standardisation
Standardisation ALSPAC birth cohort using probabilistic record linkage techniques.

  • School names coded to National Pupil Database ‘Unique Record Number’ (using http://www.edubase.gov.uk)

  • Names converted to upper case

  • All spaces and symbols contained within a name removed:

    • O’Driscoll to ODRISCOLL

    • St.Claire to STCLAIRE


Standardisation1
Standardisation ALSPAC birth cohort using probabilistic record linkage techniques.

  • Names matched to a name Lexicon, compiled from:

    • NHS name lexicon

    • National Pupil Database

    • ALSPAC ‘known as’ names

    • Non-matching names evaluated using Jaro string comparator metrics (assesses spelling differences, typos, keying errors, string lengths)

      • See Herzog, Scheuren and Winkler 2007

    • “A Dictionary of First Names” Oxford University Press 2006


Standardisation2
Standardisation ALSPAC birth cohort using probabilistic record linkage techniques.

  • Name Lexicon examples:

    • Andrew, Andy, Andi, Drew all categorised to the same male group

    • Abigail, Abbie, Abi, Ab1 all categorised to the same group

  • Where are two linked names not the same?

    • E.g. Should Abraham and Ibrahim be categorised together?

  • Names can be included in multiple groups (impacts on linkage evaluation)


Standardisation3
Standardisation ALSPAC birth cohort using probabilistic record linkage techniques.

  • Impact of Lexicon, unique values condensed into categories:

    • Forenames 2,108 into 1,339

    • Surnames 5,743 into 4,895


Linkage methodology
Linkage Methodology ALSPAC birth cohort using probabilistic record linkage techniques.

  • Used approach developed by Fellegi & Sunter (1969)

    Aim to simulate human reasoning by comparing each of several elements from the two records… from fundamental concepts of probability

    Clark 2004


Estimating match weights
Estimating Match Weights ALSPAC birth cohort using probabilistic record linkage techniques.

  • For a given field with match probability M and unmatch probability U

    • For an agreement:

      • Log (M/U)

    • For a disagreement

      • Log (1-M/1-U)

    • Sum the weights across all matching comparisons (all the fields)


Weightings
Weightings ALSPAC birth cohort using probabilistic record linkage techniques.

  • M-Probability: Probability that the identifier agrees given a true match

    • Based on assessment of the quality of the data (i.e. data entry errors, missing data but accounting for improvements due to cleaning and standardisation)


Weightings1
Weightings ALSPAC birth cohort using probabilistic record linkage techniques.

  • U-Probability: Probability that identifier agrees given that the records do not constitute a true match

    • Based on ‘Gold Standard’ of the existing ALSPAC – National Pupil Database linkage

    • Supported by data, 95% nominated friends described as being in education in the ALSPAC time period


Stratification or blocking
Stratification or ‘blocking’ ALSPAC birth cohort using probabilistic record linkage techniques.

  • Large number (14,500 x 14,500) of possibilities to evaluate

    • So we ‘blocked’ on identifiers with low discriminatory potential (gender, school year) and high potential (name, school)

    • Multiple iterations so as not to exclude cases which contained errors in the blocking identifiers


Manual review
Manual Review ALSPAC birth cohort using probabilistic record linkage techniques.

  • Evaluated a random selection of cases to determine thresholds for accepting a match as:

    • Definitely ‘true’ (including some false positives)

    • Definitely ‘false’ (excluding some true positives)


Manual review1
Manual Review ALSPAC birth cohort using probabilistic record linkage techniques.

  • Cases with results between the two thresholds all manually reviewed


Results
Results ALSPAC birth cohort using probabilistic record linkage techniques.

  • Data

    • 3,123 respondents

    • nominated 4.64 friends on average

    • 14,503 nominated friends

  • First Phase of Linkage

    • 11,327 individuals identified

  • Linkage to ALSPAC

    • 6,961 nominated friends linked

    • 4,572 individuals linked


Results1
Results ALSPAC birth cohort using probabilistic record linkage techniques.


Results network structure
Results: Network Structure ALSPAC birth cohort using probabilistic record linkage techniques.

  • Total Network

    • 13,056 individuals in total

      (1,394 respondents are also nominated as a friend)

  • 50% of nominations are to someone in ALSPAC

    • 12% of nominations are to someone who is also a respondent to the friendship questionnaire


Results network structure1
Results: Network Structure ALSPAC birth cohort using probabilistic record linkage techniques.

  • Largest component contains 2/3 of the individuals in the network


Future research
Future Research ALSPAC birth cohort using probabilistic record linkage techniques.

  • Structure of the network

  • Homophily

    • The tendancy to establish relationships among people who share similar characteristics or attributes


Future research1
Future Research ALSPAC birth cohort using probabilistic record linkage techniques.

  • Risk taking behaviour

  • Antisocial behaviour

  • Transition into Higher Education, Employment or unemployment

  • And many more…


Reflections on the linkage process
Reflections on the Linkage Process ALSPAC birth cohort using probabilistic record linkage techniques.


Reflections on linkage process
Reflections on Linkage Process ALSPAC birth cohort using probabilistic record linkage techniques.

Quality of the data determines the quality of the linkage

  • To reflect this the majority of time/resource was spent on data cleaning, standardisation and extensive manual verification


Reflections on linkage process1
Reflections on Linkage Process ALSPAC birth cohort using probabilistic record linkage techniques.

Establishing the weightings

  • Method not without problems as excludes privately educated pupils, who have different name frequencies

  • Weighting established on national population, but ALSPAC regionally clustered

  • Potential to use statistical approaches instead


Reflections on linkage process2
Reflections on Linkage Process ALSPAC birth cohort using probabilistic record linkage techniques.

Ultimately

  • While resource intensive the methodology did allow the identification of a friendship network within ALSPAC

  • Little evidence to suggest that this was as ethical contentious from cohorts perspective as expected (based only on response rates and small numbers of complaints – further research into this would have been of interest)


Continuing role of linkage
Continuing Role of Linkage ALSPAC birth cohort using probabilistic record linkage techniques.

  • Linkage to administrative records is, by adding to the ALSPAC resource, providing new data which can be used in social network analysis


Thank you
Thank You ALSPAC birth cohort using probabilistic record linkage techniques.

Questions?

Andy Boyd

[email protected]


References
References ALSPAC birth cohort using probabilistic record linkage techniques.

  • Clark DE (2004) Practical introduction to record linkage for injury research. Injury Prevention 10, 186-191

  • Felligi IP & Sunter AB (1969) A theory for record linkage. Journal of the American Statistical Association 64, 1183-1210

  • Herzog TN, Scheuren FJ and Winkler WE (2007) Data Quality and Record Linkage Techniques. New York: Springer.

  • Jackson M (2010) An overview of social networks and economic applications. In Handbook of social economics, edited by Benhabib J, Bisin A & Jackson M


ad