29 th international traffic records forum
Download
Skip this Video
Download Presentation
29 th International Traffic Records Forum

Loading in 2 Seconds...

play fullscreen
1 / 25

29 th International Traffic Records Forum - PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on

29 th International Traffic Records Forum. Using Multiple Imputation to Resolve the Missing Data Issue.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 29 th International Traffic Records Forum' - kalli


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
29 th international traffic records forum

29th InternationalTraffic Records Forum

Using Multiple Imputation to Resolve the Missing Data Issue

slide2
“There are known “knowns” (i.e., there are things we know we know). There are known “unknowns” (i.e., there are things we know we don’t know) and there are unknown “unknowns” (i.e., there are things we don’t know we don’t know).”

Donald Rumsfeld

some known knowns
Some Known Knowns
  • How many crash records there are
  • How records are in the files we are matching
some known unknowns
Some Known Unknowns
  • How many crash records should there be?
  • How many people are hospitalized for motor vehicle crashes?
  • How many people are transported by EMS for motor vehicle crashes?
an unknown unknown
An Unknown Unknown
  • What is the effect of missing data and missing links on the analysis?
conditions for multiple imputation
Conditions for Multiple Imputation
  • Data must be “missing at random.”
  • The model used to generate the imputed values must be “correct.”
  • The analytic model must match up with the with the model used in the imputation.
missing completely at random mcar
Missing Completely at Random (MCAR)
  • The missing data are simply a random sample of all missing values.
  • For example, in a data set of crash records, safety belt usage would be MCAR if people who had safety belt usage reported, on average, had the same level of safety belt usage as people for whom safety belt usage was not reported and each of the other variables in the data set were the same, on average, for the people who had safety belt usage reported compared to those for whom safety belt usage was not reported.
  • In the case of MCAR, imputation is not needed, but missing data is rarely MCAR and there is no way to test if data are MCAR.
missing at random mar
Missing at Random (MAR)
  • The “missingness” of data for variable Y is unrelated to the value of Y but may be related to other variables in the data.
  • For example in a crash data set safety belt usage would be MAR if probability of reporting safety belt usage is related to gender, but within each category of gender the probability of missing safety belt information is unrelated to the person’s safety belt usage.
  • In the case of MAR imputation can provide better estimates of variances, measures of central tendency, confidence intervals and standard deviations.
nonignorable missing data
“Nonignorable” Missing Data
  • The “missingness” of data for variable Y is related to the value of Y.
  • For example in a crash data set, safety belt usage would be “nonignorable” if the probability of safety belt usage being reported was related to whether a safety belt was used.
  • Imputation is not appropriate in the case of nonignorable missing data.
conditions for multiple imputation1
Conditions for Multiple Imputation
  • Data must be “missing at random.”
  • The model used to generate the imputed values must be “correct.”
  • The analytic model must match up with the with the model used in the imputation.
imputed matches
Imputed Matches
  • Requires a good estimate of how many true matches are possible if the imputation model is to be “correct.” and the number of imputed matches is to be plausible.
  • A simple solution would be to simply look at how many records are in each data set.
the ideal world
The Ideal World

Crash Records N= 360,000

EMS Records N = 68,500

Inpatient Records N = 17,000

Every EMS and inpatient record should link to a crash record

how many crash records should there be
How Many Crash Records Should There Be?
  • Ideally, there should be one crash record for each person involved in a crash.
  • Crash reporting systems vary from state to state (some collect information on all persons involved, some collect information only on drivers, other collect information only on injured people, other collect information only for injury crashes)
are there duplicate crash records
Are There Duplicate Crash Records?
  • Duplicates often occur through data processing methods (e.g., updating of records, re-submission of data, multiple reports)
  • Duplicates are a relatively easy problem to deal with.
are there missing crash records
Are There Missing Crash Records?
  • Missing records can occur through data processing methods
  • Missing records can also occur due to failure to report or reporting thresholds.
  • Some people are injured in crashes that occur out-of-state (i.e., they may appear in inpatient hospital file but not in crash file)
some methods to check for missing crash records
Some Methods to Check for Missing Crash Records
  • Check number of records by date and submitting entity
  • Cross-reference data sets
  • Population-based rates
  • Historical trends
  • Check referential integrity of the data
are there duplicate ems records
Are There Duplicate EMS Records?
  • Duplicates can occur through data processing methods (e.g., updating of records, re-submission of data, multiple reports)
  • Duplicates may result from the way data is reported.
  • Duplicates may result from there really being more than a single event.
more than a single ems event
More Than a Single EMS Event
  • Often there are multiple providers involved in EMS services (e.g., ALS, BLS, Air and ground, inter-facility transport).
  • Sometimes records for the same event can be identified by a common incident number or by looking at response outcome or incident type.
  • CODES 2000 can be used to do a self-match of records.
are there missing ems records
Are There Missing EMS Records?
  • Missing records can occur through data processing methods
  • Missing records can also occur due to failure to report or reporting thresholds.
some methods to check for missing ems records
Some Methods to Check for Missing EMS Records
  • Check number of records by date and submitting entity
  • Cross-reference data sets
  • Population-based rates
  • Historical trends
  • Check referential integrity of the data
are there duplicate inpatient records
Are There Duplicate Inpatient Records?
  • Duplicates can occur through data processing methods (e.g., updating of records, re-submission of data, multiple reports)
  • Duplicates may result from the way data is reported.
  • Duplicates may result from there really being more than a single event.
more than a single discharge
More Than a Single Discharge
  • Patients may be discharged and re-admitted multiple times (e.g., complications, late effects, rehabilitation, surgery).
  • The “frequent flyer” phenomenon.
  • About 10% of motor vehicle crash victims have more than a single admission
  • Routines available in SAS, SPSS and Perl to array records into a single record
are there missing inpatient records
Are There Missing Inpatient Records?
  • Missing records can occur through data processing methods
  • Missing records can also occur due to failure to report.
  • Missing hospital records can also occur if the patient is hospitalized out of state
more information on multiple imputation
More Information on Multiple Imputation
  • Rubin & Little - Statistical Analysis with Missing Data
  • Shaffer JL – Analysis of Incomplete Multivariate Data
  • Rubin DB – Multiple Imputation After 18 Years, Journal of the American Statistical Association June 1996. pp 473-481.
  • NHTSA. Transitioning to Multiple Imputation – New Method to Impute Blood Alcohol Concentration in FARS
  • WWW.stat.psu.edu/~jls/mifaq/html
ad