Missing data measurement error
Download
1 / 69

Missing Data Measurement Error - PowerPoint PPT Presentation


  • 223 Views
  • Updated On :

Missing Data & Measurement Error. Welcome to Rachel Whitaker. Overview. Missing data are inevitable Some missing data are “inherent” Prevention is better than statistical “cures” Too much missing information invalidates a study There are many methods for accommodating missing data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Missing Data Measurement Error' - aulani


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Missing data measurement error

Missing Data& Measurement Error

Welcome to Rachel Whitaker

Bio753—Advanced Methods

in Biostatistics, III


Overview
Overview

  • Missing data are inevitable

  • Some missing data are “inherent”

  • Prevention is better than statistical “cures”

  • Too much missing information invalidates a study

  • There are many methods for accommodating missing data

    • Their validity depends on the missing data mechanism and the analytic approach

  • Issues can be subtle

  • A little data on the missingness process can be helpful

Bio753—Advanced Methods

in Biostatistics, III


Common types of missing data
Common types of missing data

  • Survey non-response

  • Missing dependent variables

  • Missing covariates

  • Dropouts

  • Censoring

    • administrative, due to competing events or due to loss to follow-up

  • Non-reporting or delayed reporting

  • Noncompliance

  • Measurement error

Bio753—Advanced Methods

in Biostatistics, III


Implications of missing data
Implications of missing data

Missing data produces/induces

  • Unbalanced data

  • Loss of information and reduced efficiency

  • Extent of information loss depends on

    • Amount of missingness

    • Missingness pattern

    • Association between the missing and observed data

    • Parameters of interest

    • Method of analysis

      Care is needed to avoid biased inferences,

      inferences that target a reference population other

      than that intended

  • e.g., those who stay in the study

Bio753—Advanced Methods

in Biostatistics, III


Inherent missingness
Inherent missingness

Right-censoring

  • We know only that the event has yet to occur

    • Issue: “No news is no news” versus

      “no news is good news”

      Latent disease state

  • Disease Free/Latent Disease/Clinical Disease

    • Screen and discover latent disease

    • Only known that transition DFLD occurred before the screening time and that LDCD has yet to occur

Bio753—Advanced Methods

in Biostatistics, III


Missing Data MechanismsLittle RJA, Rubin D. Statistical analysis with missing data. Chichester, NY: John Wiley & Sons; 2002

Missing Completely at random (MCAR)

  • Pr(missing) is unrelated to process under study

    Missing at Random (MAR)

  • Pr(missing) depends only on observed data

    Not Missing at Random (NMAR)

  • Pr(missing) depends on both observed

    and unobserved data

    These distinctions are important because

    validity of an analysis depends

    on the missing data mechanism

Bio753—Advanced Methods

in Biostatistics, III


Notation for a missing dependent variable in a longitudinal study
Notation (for a missing dependent variable in a longitudinal study)

i indexes participant (unit), i = 1,…,n

j indexes measurement (sub-unit), j = 1,…,J

  • Potential response vector

    Yi = (Yi1, Yi2, …, YiJ)

  • Response Indicators

    Ri = (Ri1, Ri2, …, RiJ)

    Rij = 1 if Yij is observed and Rij = 0 if Yij is missing

  • Given Ri, Yi can be partitioned into two components:

    YiO observed responses

    YiM missing responses

Bio753—Advanced Methods

in Biostatistics, III


Schematic representation of response vector and response indicators
Schematic Representation of Response vector and Response indicators

  • Eg: Y2 = (Y21, Y22, Y23, … , Y2J) R2 = (1, 0, 1, … , 1)

    • Y2O = (Y21, Y23, …, Y2J) Y2M = (Y22)

Bio753—Advanced Methods

in Biostatistics, III


More general missing data
More general missing data indicators

  • A similar notation can be used for missing regressors (Xij) and for missing components of an even more general data structure

  • Using “Y” to denote all of the potential data (regressors, dependent variable, etc.), the foregoing notation applies in general

Bio753—Advanced Methods

in Biostatistics, III


Missing data mechanisms
Missing Data Mechanisms indicators

  • Some mechanisms are relatively benign and do not complicate or bias an analysis

  • Others are not benign and can induce bias

    Example

  • Goal is to predict weight from gender and height

  • Use information from Bio656 students

  • Possible reasons for missing data

    • Absence from class

    • Gender-associated, non-response

    • Weight-associated, non-response

      How would each of the above reasons affect results?

Bio753—Advanced Methods

in Biostatistics, III


Missing completely at random mcar
Missing Completely at Random (MCAR) indicators

  • Missingness is a chance mechanism that does not depend on observed or unobserved responses

    • Ri is independent of both YiO and YiM

      Pr(Ri | YiO , YiM ) = Pr(Ri)

  • In the weight survey example, missingness due to absence from class is unlikely to be related to the relation between weight, height and gender

  • The dataset can be regarded as a random sample from the target population (the full class, Bio620 over the years, ....)

  • A complete-case analysis is appropriate, albeit with a drop in efficiency relative to obtaining more data

Bio753—Advanced Methods

in Biostatistics, III


Missing completely at random mcar1

Height (cm) indicators

Missing Completely at Random (MCAR)

  • The probability of having a missing value for variable Y is unrelated to the value of Y or to any other variables in the data set

  • A complete-case analysis is appropriate

Bio753—Advanced Methods

in Biostatistics, III


Missing at random mar
Missing at random (MAR) indicators

  • Missingness depends on the observed responses, but does not depend on what would have been measured, but was not collected

    Pr(Ri|YiO,YiM) = Pr(Ri|YiO)

  • The observed data are not a random sample from the full population

    • In the weight survey example, data are MAR if Pr(missing weight) depends on gender or height but not on weight

  • Even though not a random sample, the distribution of YiM conditional on YiO is the same as that in the reference population (the full class)

  • Therefore, YiM can be validly predicted using YiO

    • Of course, validity depends on having a correct model for the mean and dependency structure for the observed data

  • But, we don’t need to do these predictions to get a valid inferences

Bio753—Advanced Methods

in Biostatistics, III


Missing at random mar1

Height (cm) indicators

Missing at random (MAR)

  • The probability of missing data on Y is unrelated to the value of Y, after controlling for other variables in the analysis

  • Analysis using the wrong model is not valid

    • e.g., uncorrelated regression, when correlation is needed

A complete case analysis

gives a valid slope, when

selection is on the predictors,

BUT correlation will be biased.

Bio753—Advanced Methods

in Biostatistics, III


When the mechanism is mar
When the mechanism is MAR indicators

  • Complete-case methods and standard regression methods based on all the available data can produce biased estimates of mean response or trends

  • If the statistical model for the observed data is correct, likelihood-based methods using only the observed data are valid

  • Requires that the joint distribution of the observed Yis is correctly specified,

    • when the mean and covariance are correct

    • when using a correct GEE working model

    • when using correct random effects

      Ignorability

  • With a correct model for the observeds, under MAR the details of the missing data mechanism are not needed; the mechanism is ignorable

    • Ignorability is not an inherent property of the mechanism

    • It depends on the mechanism and on the analytic model

Bio753—Advanced Methods

in Biostatistics, III


Not missing at random nmar
Not missing at random (NMAR) indicators

  • Missingness depends on the responses that could have been observed

    Pr(Ri|YiO,YiM)does depend on YiM

  • The observed data cannot be viewed as a random sample of the complete data

  • The distribution of YiM conditional on YiO is not the same as that in the reference population (the full class)

  • YiM depends on YiOand on Pr(Ri|YiO,YiM) and on Pr(Y)

  • In the weight survey example, data are NMAR if missingness depends on weight

Bio753—Advanced Methods

in Biostatistics, III


Missing data mechanisms not missing at random nmar

Height (cm) indicators

Missing Data Mechanisms:Not missing at random (NMAR)

  • Also known as

    • Non-ignorable missing

  • The probability of missing data on Y is related to the value of Y even if we control for other variables in the analysis.

  • A complete-case analysis is NOT valid

  • Any analysis that does not take dependence on Y into account is not valid

  • Inferences are highly model dependent

Bio753—Advanced Methods

in Biostatistics, III


Mar for y vs x y x nmar for cor x y
MAR for Y vs X [Y | X] indicatorsNMAR for cor(X,Y)

Bio753—Advanced Methods

in Biostatistics, III


When the mechanism is nmar
When the mechanism is NMAR indicators

  • Almost all standard methods of analysis are invalid

    • Valid inferences require joint modeling of the response and the missing data mechanism Pr(Ri|YiO,YiM)

  • Importantly, assumptions about Pr(Ri|YiO,YiM) cannot be empirically verified using the data at hand

  • Sensitivity analyses can be conducted

    (Dan Scharfstein’s research focus)

  • Obtaining values from some missing Ys can inform on the missing data mechanism

Bio753—Advanced Methods

in Biostatistics, III


Dropouts if missing missing thereafter
Dropouts indicators(if missing, missing thereafter)

Dropout Completely at Random

  • Dropout at each occasion is independent of all past, current, and future outcomes

    • Is assumed for Kaplan-Meier estimator and Cox PHM

      Dropout at Random

  • Dropout depends on the previously observed outcomes up to, but not including, the current occasion

    • i.e., given the observed outcomes, dropout is independent of the current and future unobserved outcomes

      Dropout Not at Random, “informative dropout”

  • Dropout depends on current and future unobserved outcomes

Bio753—Advanced Methods

in Biostatistics, III


Probability of a follow-up lung function measurement depends on smoking status and current lung function

Is the mechanism MAR?

We don’t know!

Bio753—Advanced Methods

in Biostatistics, III


Lung function decline in adults

LUNG FUNCTION DECLINE IN ADULTS on smoking status and current lung function

Bio753—Advanced Methods

in Biostatistics, III


Longitudinal dropout example
Longitudinal dropout example on smoking status and current lung function

  • Repeated measurements Yit

    i indexes people, i=1,…,n

    t indexes time, t=1,…,5

    Yit = μit = 0 + 1t + eit

    cor = cov(eis, eit) = |s-t|;  0

  • 0 = 5, 1 = 0.25,  = 1,  = 0.7

Bio753—Advanced Methods

in Biostatistics, III


Longitudinal dropout example the dropout mechanism
Longitudinal dropout example on smoking status and current lung functionthe dropout mechanism

  • Dropout indicator, Di

  • Di = k if person i drops out between the (k-1)st and kth occasion

  • Assume that

  • Dropout is MCAR if q2 = q3 = 0

  • Dropout is MAR if q3 = 0

  • Dropout is NMAR if q3 ≠ 0

Bio753—Advanced Methods

in Biostatistics, III


Population regression line vs observed data means
Population Regression Line vs. Observed Data Means on smoking status and current lung function

MCAR (q1= -0.5,q2=q3 = 0)

MAR (q1= -0.5, q2=0.5,q3 = 0)

Y

Y

6.5

6.5

6

6

5.5

5.5

5

5

T

T

1

2

3

4

5

1

2

3

4

5

NMAR (q1= -0.5, q2=0,q3 = 0.5)

Y

6.5

6

5.5

5

T

Bio753—Advanced Methods

in Biostatistics, III

1

2

3

4

5


Analysis results the true regression parameters are intercept 5 0 and slope 0 25 0 7
Analysis results on smoking status and current lung functionThe true regression parameters are intercept = 5.0 and slope = 0.25,  = 0.7

Bio753—Advanced Methods

in Biostatistics, III


Misspecified gee when the truth is random intercepts and slopes
Misspecified GEE on smoking status and current lung function(when the truth is random intercepts and slopes)

CompleteData (GEE)

PartialMissing Data (GEE)

Y

Y

Time

Time

Bio753—Advanced Methods

in Biostatistics, III


Correctly specified random effects when the truth is random intercepts and slopes
Correctly specified Random Effects on smoking status and current lung function(when the truth is random intercepts and slopes)

Complete Data (REM)

Partial Missing Data (REM)

Y

Y

Time

Time

Bio753—Advanced Methods

in Biostatistics, III


The probability of dropping out depends on the observed history

The probability on smoking status and current lung functionof dropping out depends on theobserved history

Bio753—Advanced Methods

in Biostatistics, III


One step at a time
One step at a time on smoking status and current lung function

Bio753—Advanced Methods

in Biostatistics, III


There are 5 different “trajectories” on smoking status and current lung function

with relative weights 2 2 1 1 2

The OLS analysis has regressors 0, 1, 2 and dependent variables

0, ,2

The Indep. Increments analysis has a constant regressor “1” and so is just estimating the mean. The dependent variable is either + or -

Bio753—Advanced Methods

in Biostatistics, III


If the missing data process is MAR on smoking status and current lung functionand if we use the correct model for the observed data, the missing data mechanism is “ignorable”

  • In the foregoing example, computing first differences (current value – previous value) and averaging them differences is an unbiased estimate (of 0) no matter how complicated the MAR missing data process

  • We don’t have to know the details of the dropout process (it can be very complicated), as long as the probabilities depend only on what has been observed and not on what would have been observed

  • Ignorability depends on using the correct model for the observed data (mean and dependency structure)

  • If the errors were independent (rather than the first differences), then standard OLS would be unbiased

Bio753—Advanced Methods

in Biostatistics, III


Analytic approaches
Analytic Approaches on smoking status and current lung function

Complete Case Analysis

  • Global complete case analysis

  • Individual model complete case analysis

  • Augment with missing data indicators

    • primarily for missing Xs

  • Weighting

  • Imputation

    • Single

    • Multiple

  • Likelihood-based (model-based) methods

Bio753—Advanced Methods

in Biostatistics, III


Analytic approaches1
Analytic Approaches on smoking status and current lung function

Global complete-case Analysis

(use only data for people with fully complete data)

  • Biased, unless the dropout is MCAR

  • Even if MCAR is true, can be immensely inefficient

    Analyze Available Data (use data for people with complete data on the regressors in the current model)

  • More efficient than complete-case methods, because uses maximal data

  • Biased unless the dropout is MCAR

  • Can produce floating datasets, producing “illogical” conclusions

    • R2 relations are not monotone

      Use Missing data indicators (e.g., create new covariates)

Bio753—Advanced Methods

in Biostatistics, III


Weighting
Weighting on smoking status and current lung function

  • Stratify samples into J weighting classes

    • Zip codes

    • propensity score classes

  • Weight the observed data inversely according to the response rate of the stratum

    • Lower response rate  higher weight

  • Unbiased if observed data are a random sample in a weighting class (a special form of the MAR assumption)

  • Biased, if respondents differ from non-respondents in the class

  • Difficult to estimate the appropriate standard error because weights are estimated from the response rates

Bio753—Advanced Methods

in Biostatistics, III


Simple example of weighting adjustment
Simple example of weighting adjustment on smoking status and current lung function

  • Estimate the average height of villagers in two villages

  • Surveys sent to 10% of the population in both villages

  • Direct, unweighted: 1.7*(2/3) + 1.4*(1/3) = 1.60m

  • Weighted: 100*1.7*0.005 + 50*1.4*0.01 = 1.55m (= 1.7*.5 + 1.4*.5)

2 x Weight

Bio753—Advanced Methods

in Biostatistics, III


Single imputation
Single Imputation on smoking status and current lung function

Single Imputation

  • Fill in missing values with imputed values

  • Once a filled-in dataset has been constructed, standard methods for complete data can be applied

    Problem

  • Fails to account for the uncertainty inherent in the imputation of the missing data

  • Don’t use it!

Bio753—Advanced Methods

in Biostatistics, III


Multiple imputation rubin 1987 little rubin 2002
Multiple Imputation on smoking status and current lung functionRubin 1987, Little & Rubin 2002

  • Multiply impute “m” pseudo-complete data sets

    • Typically, a small number of imputations (e.g., 5 ≤ m ≤10) is generally sufficient

  • Combine the inferences from each of the m data sets

  • Acknowledges the uncertainty inherent in the imputation process

  • Equivalently, the uncertainty induced by the missing data mechanism

  • Rubin DB. Multiple Imputation for Nonresponse in Surveys, Wiley, New York, 1987

  • Little RJA, Rubin D. Statistical analysis with missing data. Chichester, NY: John Wiley & Sons; 2002

Bio753—Advanced Methods

in Biostatistics, III


Multiple imputation
Multiple Imputation on smoking status and current lung function

Bio753—Advanced Methods

in Biostatistics, III


Multiple imputation combining inferences
Multiple Imputation: Combining Inferences on smoking status and current lung function

  • Combine m sets of parameter estimates to provide a single estimate of the parameter of interest

  • Combine uncertainties to obtain valid SEs

  • In the following, “k” indexes imputation

This computation is correct

for fully efficient estimators.

Within-imputation variance

Between-imputation variance

Bio753—Advanced Methods

in Biostatistics, III


Multiple imputation combining inferences1
Multiple Imputation: Combining Inferences on smoking status and current lung function

  • Combine m sets of parameter estimates to provide a single estimate of the parameter of interest

  • Combine uncertainties to obtain valid SEs

  • In the following, “k” indexes imputation

Within-imputation covariance

Between-imputation covariance

Bio753—Advanced Methods

in Biostatistics, III


Producing the imputed values
Producing the Imputed Values on smoking status and current lung function

Last value carried forward (LVCF)

  • Single Imputation (never changes)

  • Assumes the responses following dropout remain constant at the last observed value prior to dropout

  • Unrealistic unless, say, due to recovery or cure

  • Underestimates SEs

    Hot deck

  • Randomly choose a fill-in from outcomes of “similar” units

  • Distorts distribution less than imputing the mean or LVCF

  • Underestimates SEs

Bio753—Advanced Methods

in Biostatistics, III


Valid imputation
Valid Imputation on smoking status and current lung function

Build a model relating observed outcomes

  • Means and covariances and random effects, ...

  • Goal is prediction, so be liberal in including predictors

  • Don’t use P-values; don’t use step-wise

  • Do use multiple R2, predictions sums of squares, cross-validation, ...

Bio753—Advanced Methods

in Biostatistics, III


Producing imputed values
Producing Imputed Values on smoking status and current lung function

Sample values of YiM from pr(YiM|YiO, Xi)

  • Can be straightforward or difficult

  • Monotone case: draw values of YiM from pr(YiM|YiO,Xi) in a sequential manner

  • Valid when dropouts are MAR or MCAR

    Propensity Score Method

  • Imputed values are obtained from observations on people who are equally likely to drop out as those lost to follow up at a given occasion

  • Requires a model for the propensity (probability) of dropping out, e.g.,

Bio753—Advanced Methods

in Biostatistics, III


Producing imputed values recall that y is all of the data not just the dependent variable
Producing Imputed Values on smoking status and current lung functionRecall that “Y” is all of the data, not just the dependent variable

Predictive Mean Matching (build a regression model!)

  • A series of regression models for Yik, given Yi1, …,Yik-1, are fit using the observed data on those who have not dropped out by the kth occasion. For example,

    E(Yik) = 1 + 2Yi1 +…+ kYi(k-1)

    V(Yik) =

    Yields and

  • Parameters * and 2* are then drawn from the distribution of the estimated parameters (to account for the uncertainty in the estimated regression)

  • Missing values can then be predicted from

    1*+ 2*Yi1+…+ k*Yik-1+ *ei,

    where ei is simulated from a standard normal distribution

  • Repeat 1 and 2

Bio753—Advanced Methods

in Biostatistics, III


Missing presumed at random
Missing, presumed at random on smoking status and current lung function

Cost-analysis with incomplete data*

  • Estimate the difference in cost between transurethral resection (TURP) and contact-laser vaporization of the prostate (Laser)

  • 100 patients were randomized to one of the two treatments

    • TURP: n = 53; Laser: n = 47

  • 12 categories of medical resource usage were measured

    • e.g., GP visit, transfusion, outpatient consultation, etc.

* Briggs A et al. Health Economics. 2003; 12, 377-392

Bio753—Advanced Methods

in Biostatistics, III


Missing data
Missing data on smoking status and current lung function

Complete-case analysis uses only half of the patients in the study even though 90% of resource usage data were available

Bio753—Advanced Methods

in Biostatistics, III


Comparison of inferences
Comparison of inferences on smoking status and current lung function

Note that mean imputation understates uncertainty.

Bio753—Advanced Methods

in Biostatistics, III


Multiple imputation versus likelihood analysis when data are mar
Multiple Imputation versus on smoking status and current lung functionlikelihood analysis when data are MAR

  • Both multiple imputation or used of a valid statistical model for the observed data (likelihood analysis) are valid

    • The model-based analysis will be more efficient, but more complicated

  • Validity of each depends on correct modeling to produce/induce ignorability

Bio753—Advanced Methods

in Biostatistics, III


What if you doubt the mar assumption you should always doubt it
What if you doubt the MAR assumption on smoking status and current lung function(you should always doubt it!)

You can never empirically rule out NMAR

  • Methods for NMAR exist, but they require information and assumptions on

    pr(Missing | observed, unobserved)

  • Methods depend on unverifiable assumptions

  • Sensitivity analysis can assess the stability of findings under various scenarios

    • Set bounds on the form and strength of the dependence

    • Evaluate conclusions within these bounds

Bio753—Advanced Methods

in Biostatistics, III


Measurement error
MEASUREMENT ERROR on smoking status and current lung function

If a covariate (X) is measured with error,

what is the implication for regression of Y on X?

See also “Air” and “Cervix” in

volume II of the BUGS examples

Bio753—Advanced Methods

in Biostatistics, III


Measurement error another type of missing data
Measurement Error on smoking status and current lung functionAnother type of missing data

  • Measurement error is a special case of missing data because we do not get to “observe the true value” of the response or covariates

  • Depending on the measurement error mechanism and on the analysis, inferences can be

    • inefficient (relative to no measurement error)

    • biased

Bio753—Advanced Methods

in Biostatistics, III


Bio753—Advanced Methods

in Biostatistics, III


Bio753—Advanced Methods on smoking status and current lung function

in Biostatistics, III


The two pure forms relating x t x o
The two “Pure Forms” on smoking status and current lung functionrelating Xt & Xo

Classical: Xo = Xt + , (0, 2)

What you see is a random deviation from the truth

  • Measured & true blood pressure

  • Measured and true social attitudes

    Berkson: Xt = Xo + 

    The truth is a random deviation from what you see

  • Individual SES measured by ZIP-code SES

  • Personal air pollution measured by centrally monitored value

  • Actual temperature & thermostat setting

Bio753—Advanced Methods

in Biostatistics, III


Hybrids are possible
Hybrids are possible on smoking status and current lung function

Xt and Xo have a general joint distribution

Bio753—Advanced Methods

in Biostatistics, III


Measurement error s effect on a simple regression coefficient
Measurement error’s effect on smoking status and current lung functionon a simple regression coefficient

Classical

  • The regression coefficient on Xo is attenuated towards 0 relative to the “true” regression coefficient on Xt

  • Because, the spread of Xo is greater than that for Xt

    Berkson

  • No effect on the expected regression coefficient

  • Variance inflation

Bio753—Advanced Methods

in Biostatistics, III


Berkson

Berkson on smoking status and current lung function

Xt = X0 + , (0, 2)

true: Y = int + Xt + resid

= int + (X0 + ) + resid

observed: Y = int + * X0 + resid

Var(X0) = 02

No attenuation * = 

because E(Xt | X0) = X0

Bio753—Advanced Methods

in Biostatistics, III


Classical

Classical on smoking status and current lung function

Xo = Xt + , (0, 2)

true: Y = int + Xt + resid

observed: Y = int + *X0 + resid

= int + *(Xt + ) + resid

 Var(X0) = t2+ 2 (X0 is stretched out)

Attenuation (attenuation factor )

* = 

 = t2 /(t2 + 2)

slope = cov(Y, X)/Var(X), but E(Xt | X0) =  X0

Bio753—Advanced Methods

in Biostatistics, III


Y versus x t

Y versus X on smoking status and current lung functiont

Bio753—Advanced Methods

in Biostatistics, III


Y versus x 0

Y versus X on smoking status and current lung function0

Bio753—Advanced Methods

in Biostatistics, III


An illustration
An illustration on smoking status and current lung function

Back to the basic example

  • W = Weight (lb)

  • H = Height (cm)

  • Analysis: simple linear regression

    Wi = b0 + b1 Hi+ ei where ei ~ N(0, s2)

    Assume the true model to be:

    Wi = 3 + 1.0Hi+ ei where ei ~ N(0, 82)

    Measurement error

  • Error in W: observe W* = W + ei* where ei ~ N(0, 42)

  • Error in H : observe H* = H + i* where i*~ N(0, 102)

Bio753—Advanced Methods

in Biostatistics, III


Scenario 1 measurement error in response

Results: on smoking status and current lung function

b1 = 1.16

SE(b1)= 0.15

b1 = 1.08

SE(b1) = 0.18

Scenario 1: Measurement Error in Response

  • Standard regression estimate for b1 is unbiased, but less efficient

  • The larger is the measurement error, the greater the loss in efficiency

Bio753—Advanced Methods

in Biostatistics, III


Scenario 2 measurement error in h

Results: on smoking status and current lung function

b1 = 1.16

SE(b1)= 0.15

b1 = 0.69

SE(b1)= 0.21

Scenario 2: measurement error in H

  • Standard regression estimate for b1 is biased (attenuated)

  • The larger is the measurement error, the greater the attenuation

Bio753—Advanced Methods

in Biostatistics, III


Multivariate Measurement Error on smoking status and current lung function

Xo = Xt + , (0, )

Bio753—Advanced Methods

in Biostatistics, III


Bio753—Advanced Methods on smoking status and current lung function

in Biostatistics, III


The multiple imputation algorithm in sas
The Multiple Imputation Algorithm in SAS on smoking status and current lung function

The MIANALYZEProcedure

  • Combines the m different sets of the parameter and variance estimates from the m imputations

  • Generates valid inferences about the parameters of interest

    PROC MIANALYZE <options>;

    BY variables;

    VAR variables;

Bio753—Advanced Methods

in Biostatistics, III


Multiple imputation algorithm in sas
Multiple Imputation Algorithm in SAS on smoking status and current lung function

  • PROC MI <options>;

    BY variables;

    FREQ variable;

    MULTINORMAL <options>;

    VAR variables;

  • Available options in PROC MI include: NIMPU=number (default=5)

  • Available options in MULTINORMAL statement:

    METHOD=REGRESSION

    METHOD=PROPENSITY<(NGROUPS=number)>

    METHOD=MCMC<(options)>

    The default is METHOD=MCMC

Bio753—Advanced Methods

in Biostatistics, III


Bio753—Advanced Methods on smoking status and current lung function

in Biostatistics, III


ad