Constructing an adolescence friendship network within the ALSPAC birth cohort using probabilistic re...
This presentation is the property of its rightful owner.
Sponsored Links
1 / 41

Research Team PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

Constructing an adolescence friendship network within the ALSPAC birth cohort using probabilistic record linkage techniques. Research Team. Simon Burgess (CMPO, Bristol) Eleanor Sanderson (CMPO, Bristol) Marcela Umaña (CMPO, Bristol) Andy Boyd (ALSPAC, Bristol). Study Rationale.

Download Presentation

Research Team

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Research team

Constructing an adolescence friendship network within the ALSPAC birth cohort using probabilistic record linkage techniques.


Research team

Research Team

  • Simon Burgess (CMPO, Bristol)

  • Eleanor Sanderson (CMPO, Bristol)

  • Marcela Umaña (CMPO, Bristol)

  • Andy Boyd (ALSPAC, Bristol)


Study rationale

Study Rationale

Social Networks are ubiquitous and powerful

“The people with whom we interact… influence our beliefs, decisions and behaviours” Jackson 2010

The manner in which networks carry this influence depends in detail on the structure and characteristics of the network.


Background

Background

  • Examples of researching Networks

    • ADD Health – Longitudinal survey of school children in the US. Questionnaire included a list of pupils in the school, respondent asked to nominate their five best male and five best female friends

    • Others based around communication networks or other defined communities


Background1

Background

  • Advantages of studying social networks in a cohort study:

    • Extensive phenotype and genotype data

      and extensive linkage data


Background2

Background

  • Advantages of studying social networks in ALSPAC:

    • Regional catchment area, narrow age range of participants (18 month age range, 3 school years)

  • Disadvantages:

    • Only the study participant is asked to nominate their friends


Data collection methodology

Data Collection Methodology

  • School based (register based) method not considered feasible

    • Cost

    • School Engagement

  • Questionnaire based alternative

    • Sent to participants still in compulsory education (age ~15-16)

    • Where the participant still lived in England


Data collection methodology1

Data Collection Methodology

  • Asked the participant to nominate their 5 best friends, in no particular order


Data collection methodology continued

Data Collection Methodology Continued…


Linkage objectives

Linkage Objectives

  • To identify all unique individuals from the pool of nominated friends (de-duplication)

  • To identify which of the nominated friends are also eligible to participate in ALSPAC


Before we get to linkage there s ethics

Before we get to linkage…… there’s ethics


Before we get to linkage there s ethics1

Before we get to linkage…… there’s ethics

  • Seeking personal identifiers of participants friends seen as contentious

  • Lawyers advised us that this is legal and within the bounds of Data Protection Act (1998)

  • Personal identifiers to be used for statistical use only and pseudonymised prior to research use


Before we get to linkage there s ethics2

Before we get to linkage…… there’s ethics

  • Once the nominated friends have been coded the personal identifiers cannot be used again.

  • No longitudinal follow up possible on the full data set, but it is possible on those linked to ALSPAC.


The data

The Data

  • 3,132 participants returned a questionnaire

  • 14,500 nominated friends

  • Personal Identifiers include:

    • Name, Date of Birth, School, School year, gender

  • Phenotypic data includes:

    • How they met, duration of friendship, shared interests


Data quality

Data Quality

  • Completeness of highly distinguishing personal identifiers

    • 14,414 nominated friends >=2 identifiers

    • 12,612 nominated friends >=3 identifiers

    • 6,215 nominated friends included all four identifiers


Data quality1

Data Quality


Data quality2

Data Quality

  • All data reported by a participant (age ~16) about their friends

    • Some of this will be unknown or prone to greater error, particularly date of birth and non-local schools

    • Names include many spelling errors

    • Names and school details include many abbreviations and familiar names


Standardisation

Standardisation

  • School names coded to National Pupil Database ‘Unique Record Number’ (using http://www.edubase.gov.uk)

  • Names converted to upper case

  • All spaces and symbols contained within a name removed:

    • O’Driscoll to ODRISCOLL

    • St.Claire to STCLAIRE


Standardisation1

Standardisation

  • Names matched to a name Lexicon, compiled from:

    • NHS name lexicon

    • National Pupil Database

    • ALSPAC ‘known as’ names

    • Non-matching names evaluated using Jaro string comparator metrics (assesses spelling differences, typos, keying errors, string lengths)

      • See Herzog, Scheuren and Winkler 2007

    • “A Dictionary of First Names” Oxford University Press 2006


Standardisation2

Standardisation

  • Name Lexicon examples:

    • Andrew, Andy, Andi, Drew all categorised to the same male group

    • Abigail, Abbie, Abi, Ab1 all categorised to the same group

  • Where are two linked names not the same?

    • E.g. Should Abraham and Ibrahim be categorised together?

  • Names can be included in multiple groups (impacts on linkage evaluation)


Standardisation3

Standardisation

  • Impact of Lexicon, unique values condensed into categories:

    • Forenames 2,108 into 1,339

    • Surnames5,743 into 4,895


Linkage methodology

Linkage Methodology

  • Used approach developed by Fellegi & Sunter (1969)

    Aim to simulate human reasoning by comparing each of several elements from the two records… from fundamental concepts of probability

    Clark 2004


Estimating match weights

Estimating Match Weights

  • For a given field with match probability M and unmatch probability U

    • For an agreement:

      • Log (M/U)

    • For a disagreement

      • Log (1-M/1-U)

    • Sum the weights across all matching comparisons (all the fields)


Weightings

Weightings

  • M-Probability: Probability that the identifier agrees given a true match

    • Based on assessment of the quality of the data (i.e. data entry errors, missing data but accounting for improvements due to cleaning and standardisation)


Weightings1

Weightings

  • U-Probability: Probability that identifier agrees given that the records do not constitute a true match

    • Based on ‘Gold Standard’ of the existing ALSPAC – National Pupil Database linkage

    • Supported by data, 95% nominated friends described as being in education in the ALSPAC time period


Stratification or blocking

Stratification or ‘blocking’

  • Large number (14,500 x 14,500) of possibilities to evaluate

    • So we ‘blocked’ on identifiers with low discriminatory potential (gender, school year) and high potential (name, school)

    • Multiple iterations so as not to exclude cases which contained errors in the blocking identifiers


Manual review

Manual Review

  • Evaluated a random selection of cases to determine thresholds for accepting a match as:

    • Definitely ‘true’ (including some false positives)

    • Definitely ‘false’ (excluding some true positives)


Manual review1

Manual Review

  • Cases with results between the two thresholds all manually reviewed


Results

Results

  • Data

    • 3,123 respondents

    • nominated 4.64 friends on average

    • 14,503 nominated friends

  • First Phase of Linkage

    • 11,327 individuals identified

  • Linkage to ALSPAC

    • 6,961 nominated friends linked

    • 4,572 individuals linked


Results1

Results


Results network structure

Results: Network Structure

  • Total Network

    • 13,056 individuals in total

      (1,394 respondents are also nominated as a friend)

  • 50% of nominations are to someone in ALSPAC

    • 12% of nominations are to someone who is also a respondent to the friendship questionnaire


Results network structure1

Results: Network Structure

  • Largest component contains 2/3 of the individuals in the network


Future research

Future Research

  • Structure of the network

  • Homophily

    • The tendancy to establish relationships among people who share similar characteristics or attributes


Future research1

Future Research

  • Risk taking behaviour

  • Antisocial behaviour

  • Transition into Higher Education, Employment or unemployment

  • And many more…


Reflections on the linkage process

Reflections on the Linkage Process


Reflections on linkage process

Reflections on Linkage Process

Quality of the data determines the quality of the linkage

  • To reflect this the majority of time/resource was spent on data cleaning, standardisation and extensive manual verification


Reflections on linkage process1

Reflections on Linkage Process

Establishing the weightings

  • Method not without problems as excludes privately educated pupils, who have different name frequencies

  • Weighting established on national population, but ALSPAC regionally clustered

  • Potential to use statistical approaches instead


Reflections on linkage process2

Reflections on Linkage Process

Ultimately

  • While resource intensive the methodology did allow the identification of a friendship network within ALSPAC

  • Little evidence to suggest that this was as ethical contentious from cohorts perspective as expected (based only on response rates and small numbers of complaints – further research into this would have been of interest)


Continuing role of linkage

Continuing Role of Linkage

  • Linkage to administrative records is, by adding to the ALSPAC resource, providing new data which can be used in social network analysis


Thank you

Thank You

Questions?

Andy Boyd

[email protected]


References

References

  • Clark DE (2004) Practical introduction to record linkage for injury research. Injury Prevention 10, 186-191

  • Felligi IP & Sunter AB (1969) A theory for record linkage. Journal of the American Statistical Association 64, 1183-1210

  • Herzog TN, Scheuren FJ and Winkler WE (2007) Data Quality and Record Linkage Techniques. New York: Springer.

  • Jackson M (2010) An overview of social networks and economic applications. In Handbook of social economics, edited by Benhabib J, Bisin A & Jackson M


  • Login