what s in a name accounting for naming conventions in nchs data linkages n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
What’s in a Name? Accounting for Naming Conventions in NCHS Data Linkages PowerPoint Presentation
Download Presentation
What’s in a Name? Accounting for Naming Conventions in NCHS Data Linkages

Loading in 2 Seconds...

play fullscreen
1 / 29

What’s in a Name? Accounting for Naming Conventions in NCHS Data Linkages - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

What’s in a Name? Accounting for Naming Conventions in NCHS Data Linkages. Eric A. Miller National Center for Health Statistics (NCHS) 2012 FCSM Statistical Policy Seminar December 4, 2012. “Two men say they’re Jesus. One of them must be wrong.”. Mark Knopfler , Dire Straits.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'What’s in a Name? Accounting for Naming Conventions in NCHS Data Linkages' - hanne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
what s in a name accounting for naming conventions in nchs data linkages

What’s in a Name?Accounting for Naming Conventions in NCHS Data Linkages

Eric A. Miller

National Center for Health Statistics (NCHS)

2012 FCSM Statistical Policy Seminar

December 4, 2012

slide2

“Two men say they’re Jesus.

One of them must be wrong.”

Mark Knopfler, Dire Straits

what does this have to do with data quality
What Does This Have to do With Data Quality?
  • One reason for data sharing is data linkage
  • Assessing the quality of linked data is different from assessing a standalone dataset
      • The quality of variables from a specific source doesn’t matter if the linkage is poor
      • Problems with linkage can produce poor quality data
    • Are the data fit for use? + Are the data fit for linkage?
names
Names
  • Names are commonly used in data linkages
  • Important to account for name differences and naming conventions to produce a high quality linked data file
quick background on data linkage
Quick Background on Data Linkage
  • Deterministic
    • Exact match on linkage

variables

      • Frank ≠ Francis
  • Probabilistic
    • Accounts for imperfect data
    • Probability of a match
      • Frank ≈ Francis
caveats of data linkage
Caveats of Data Linkage
  • It’s not perfect

Prince

?

Prince

Prince Rogers Nelson

Some things are out of our control!

caveats of data linkage1
Caveats of Data Linkage
  • Varying levels of quality for linkage variables can substantially increase workload
    • Clean-up, reformatting
    • Clerical review
  • Analysis of insufficiently linked data can produce biased estimates
example hispanic paradox
Example - Hispanic Paradox
  • Despite having a higher risk profile, Hispanics have been found to have lower mortality rates compared to non-Hispanic whites

Markides and Coreil (1986). Public Health Reports; 101: 253-265

slide9
Mortality Rate per 100,000 Among Women in 1986-1990 National Health Interview Survey Linked to 1991 National Death Index

Liao et al. (1998). Mortality Patterns among Adult Hispanics: Findings from the NHIS, 1986 to 1990. AJPH.

potential reasons for paradox
Potential Reasons for Paradox
  • Health selective immigration
  • Salmon bias (return migration)
  • Advantageous health behaviors and social support
  • Data quality / Insufficient linkage
potential reasons for paradox1
Potential Reasons for Paradox
  • Data quality / Insufficient linkage
    • Naming conventions for Hispanics differ from other US populations
      • Use of mother’s and father’s surname
      • May not have single middle name
    • Less likely to have social security number
      • Especially among older adults and foreign born
percent of true matches for hispanics and non hispanic whites by foreign born status
Percent of “True” Matches for Hispanics and Non-Hispanic Whites by Foreign-Born Status

Class 1: records agree on at least 8 digits of SSN as well as first and last name, middle initial, and birth year (+/- 3 years)

Joseph Lariscy. Differential record linkage by Hispanic ethnicity and age in linked mortality studies: Implications for the epidemiologic paradox. J of Aging and Health (2011); 23: 1263-1284.

what does this have to do with nchs
What does this have to do with NCHS?
  • NCHS Record Linkage Program
    • Links survey data with data collected from administrative records
    • Designed to maximize the scientific value of the NCHS population-based surveys
    • Examine factors that influence chronic disease, disability, health care utilization, morbidity, and mortality
linked nchs surveys
Linked NCHS surveys
  • National Health Interview Survey (NHIS)
  • 1999-2004 NHANES, NHANES III, and NHANES II
  • NHANES I Epidemiologic Follow-up Study (NHEFS)
  • The Second Longitudinal Study of Aging (LSOA II)
  • National Nursing Home Survey (NNHS)
linked a dministrative records
Linked Administrative Records
  • National Death Index
  • Medicare and Medicaid enrollment and claims
    • Social Security Administration Retirement and Disability
  • Pilot projects
    • Florida Cancer Data System
    • Texas Supplemental Nutrition Assistance Program (SNAP)
case study nchs survey linkage with the ndi
Case Study: NCHS Survey linkage with the NDI
  • National Death Index (NDI)
    • A national file of identifying death record information (beginning with 1979 deaths)
    • Every four years we send a file of survey participants to NDI to conduct a linkage and identify participant deaths
    • We take additional steps to try and improve the linkage
ndi matching algorithm
NDI Matching Algorithm
  • Social Security Number
  • First name
  • Middle initial
  • Last name
  • Month of birth
  • Year of birth
  • Sex
  • Father’s surname
  • State of birth
  • Race
  • State of residence
  • State of birth
  • Marital Status
nchs record linkage program
NCHS Record Linkage Program
  • To make sure we provide research quality data, we spend a lot of time processing the data to increase the chance of finding a true match
    • Try to increase the number of matches while minimizing false matches
  • Addressing name clean-up and naming conventions is a major activity
methods name clean up
Methods – Name Clean-up
  • Fix invalid characters
  • Compress spaces
  • Remove titles/descriptors/suffixes
    • e.g. Mr., baby, jr.
  • Linkage uses NYSIIS phonetic codes
    • Accounts for misspellings or unusual spellings
methods name clean up1
Methods – Name Clean-up
  • Create alternate records
    • Sent with original record
      • Among women substitute surnames for last name
      • Nicknames (using a look-up table)
        • Substituting Elizabeth for Beth
nickname lookup table
Nickname Lookup Table

Example: If first name=‘Andy’ then alternate record first name=‘Andrew’

methods name clean up2
Methods – Name Clean-up
  • Accounting for Hispanic and Asian naming conventions
    • Hispanic
      • Hispanic nickname lookup table
      • switch middle and last
    • Asian
      • switch first and last
conclusions
Conclusions
  • Care needs to be taken to avoid false links
    • Alternate records increases the number of potential matches
      • If two men claim they’re Jesus, they can both be wrong
    • Need a higher level of scrutiny to determine that a pair of records match
conclusions1
Conclusions
  • Accounting for name differences and naming conventions improves quality of the linked-data product
  • Hope our efforts to account for Hispanic and Asian naming conventions reduces potential bias
    • Need to evaluate
important considerations
Important Considerations
  • How are names are collected?
  • How are the names recorded?
  • More likely to have formal names versus nicknames?
    • Surveys may differ from official documents
  • Are maiden names (surnames) available?
  • Are there consistent rules for recording names?
acknowledgements
Acknowledgements
  • Dr. Jennifer Parker
  • Dr. Dean Judson

Thank you