1 / 35

Chapter 3 Naive Cross-Sectional Analysis of Longitudinal Data

Longitudinal Data Fall 2006. Chapter 3 Naive Cross-Sectional Analysis of Longitudinal Data. Instructors Alan Hubbard Nick Jewell. Dental Data. Dental Data. Want to model the effect of gender and age on dental distance. Y ij distance of the j th measurement on the i th child,

skyla
Download Presentation

Chapter 3 Naive Cross-Sectional Analysis of Longitudinal Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Longitudinal DataFall 2006 Chapter 3 Naive Cross-Sectional Analysis of Longitudinal Data Instructors Alan Hubbard Nick Jewell

  2. Dental Data

  3. Dental Data

  4. Want to model the effect of gender and age on dental distance. • Yij distance of the jth measurement on the ith child, • Xij1is the age of ith child, jth measurement, • Xij2is the gender of ith child, jth measurement (0 is male, 1 is female) - same for all j measurements.

  5. Reviewing Notation and the Linear Model • Yij will represent a response variable, the jth measurement of unit i. • xTij = (1, xij1, xij2,..., xijp) will be a vector of length p+1 of explanatory variables observed at the jth measurement. • j = 1,ni. i=1,m. • E(Yij)=ij, Var(Yij)=vij

  6. Nesting Observations (measurement within individual) • Set of repeated measures for unit i are collected into a ni-vector Yi=(Yi1,Yi2,...,Yini). • Yi has covariance matrix Var(Yi)=Vi. • The jk element of Viis the covariance between Yij and Yik, that is Cov(Yij,Yik)=vijk. • Cov(Yij,Yi’k)ii’=0, i.e., measurements on different subjects are uncorrelated.

  7. Combining all observations into a big data set. • We will lump the responses of all units into one big vector Y=(Y1,...Ym) which is an N-vector (total number of observations): • Most of the course will focus on regression models of the sort:

  8. Combining, cont. • We can write the model for the ith person as • and for the entire data as:

  9. The Variance-Covariance of Yi - Most General vi11 vi12 vi13 vi14 ... vi1ni vi12vi22 vi23 vi24 ... vi2ni Vi = . . . . . . . . . . . . vini1 vini2 vini3 .... vinini

  10. The Variance-Covariance of Y V1 00 0 ... 0 0 V2 0 0 ... 0 0 0 V3 0 ... 0 V(Y) = . . . . . . . . . . . . 0 0 0 0 ... Vm

  11. Example 1: Compound Symmetry • Assume that the variance of each observation, Yij, is var(Yij) = 2 • cor(Yij,Yij’) =  for jj’. • As always, cor(Yij,Yi’j’) = 0 for ii’.

  12. Example 1: V(Y) - Compound Symmetry

  13. Ignoring the correlation during estimation of ’s and using least squares • What does least squares regression do? It minimizes the squared differences between the observed and predicted outcomes where • That is, it minimizes the residual sum of squares.

  14. Ordinary Least Squares • The LS solution is (in matrix form): • Does the fact that the data may be correlated (E[eijeik] 0) make this an erroneous estimate? • First question to ask is if this estimate is biased, i.e., does,

  15. Expected values of coefficient estimates • Note that EY is the expectation of a vector or simply the list of expectations of the components of the vector.

  16. Expectation of Vectors • Now looking at vector of observations made on an individual. E(Yi) = E(Yi1,...,Yini) = (xi1T,..., xiniT)=Xi. • Finally, look at the vectors stacked together. E(Y) = (EY1,EY2, ... , EYm) = (X1, X2,..., Xm) =

  17. The Answer, Finally!! • That is, the estimator is unbiased, regardless of the correlation among measurements made on the same person, i. • From a simple estimation view (at least with regard to bias), ignoring correlation in the linear model might be OK. • It’s when inference (standard error calculations) is made about , ignoring correlation can be a problem.

  18. var( ) • So, if one knows V, then the variance of the estimated coefficients is known. • However, one never knows V so you have to estimate V. • Estimating V will require some assumption about how the Yij’s are correlated (the form of the Vi’s).

  19. var( ), cont. 2 0 0 0 ... 0 0 2 0 0 ... 0 V(Yi) = . . . . . . . . . . . . 0 0 0 0 ... 2 • Typically, in OLS, the observations are assumed to be independent and so Vi is assumed to look as above.

  20. OLS estimate of var( ) • It still remains to estimate 2, which is now the variance of the Yij’s. Note that: • We don’t know the eij’s, but we can estimate them from the residuals:

  21. OLS estimate of var( ), cont. • To get a estimate of 2, get an average (sort of) of the squared residuals: • Then, this is plugged back in Vi to get:

  22. OLS estimate of var( ), cont. • Finally, plug this back into (***) to get the estimated variance of the coefficients • This is a matrix where the diagonal elements are the estimated variances of the coefficient estimates. The SE( ) are the square-roots of these estimated variances.

  23. Problem!! • Ignoring the correlation, as in the standard errors returned by OLS, can give biased estimates of . • This can lead to erroneous confidence intervals and tests. • However, we can still use OLS if we can just repair the estimates of . • That means getting good estimate of V.

  24. Estimating V • First, we will take the case that measurements are time-structured and the same for each person (ni=n) - the dental example. • For this type of data, one can assume the most general structure about V (unstructured). • The goal is to estimate the individual components of Vi (the vijk) using the residuals of OLS.

  25. Form of the Vi. • In order to estimate the V, we will assume that the individual covariance matrices (the Vi) are all the same: V1 = V2 = ... = Vm. • That is equivalent to saying the vijk = vjk, or the variance and covariances of equivalent observations are the same for every individual. • That way, we will only have to estimate one Vi of in order to build V. • Thus, we can call each V1, V2, ..., Vm = V0.

  26. The Variance-Covariance of Y V0 00 0 ... 0 0 V0 0 0 ... 0 0 0 V0 0 ... 0 V(Y) = . . . . . . . . . . . . 0 0 0 0 ... V0

  27. Estimating the components of V0. • If we observed the errors, then it would be easy to estimate the vjk. • We don’t observe the errors, but we can still estimate the covariance of any two measurements on the same person using the residuals.

  28. Estimating V0 using Matrix Formulation • Let r be a matrix of residuals where each row is the residuals, time-ordered, for each person r11 r12 ... r1n r12r22 ... r2n r = . . . . . . . . . rm1 rm2 ... rmn • Then,

  29. Summary Algorithm • Perform OLS regression to get estimates and residuals • Make matrix r. • Estimate V0. • Create estimate of V. • Use estimate of V to get estimate of

  30. Lessons • The LS solution is still (theoretically) unbiased even if there is residual correlation among repeated measurements on a subject. • However, the inference is not unbiased, so must account for correlation when calculating • One solution for regularly measured data (e.g., all subjects have same measurements at same times ) is to use residuals and estimate the variance-covariance matrix of the data assuming each subject has the same matrix. • However, can even get more flexible (robust SE’s).

  31. Robust SE’s • Instead of assuming that very subject has the same Vi, one can estimate each subjects V-C matrix separately. • Thus, the estimate of the variance of the first measurement on subject i is: • The correlation between the 1st and 2nd measurement on the ith person is estimated as:

  32. Robust SE’s, cont. • This procedure is repeated until all the variances and covariances are estimated for each subject (again, we always assume no between subject correlation). • These are terrible estimators of the variances and covariances. • However, do we care? No we don’t!!! • Because, they still result, under certain assumptions, in a reasonable estimator of • The estimate, , sums over the, which means we gain precision by averaging over the individual estimates of the

  33. Dental Data: No accounting for Correlation after OLS . gen inter = age*gender . regress distance age gender inter Source | SS df MS Number of obs = 108 -------------+------------------------------ F( 3, 104) = 25.39 Model | 387.935027 3 129.311676 Prob > F = 0.0000 Residual | 529.757102 104 5.09381829 R-squared = 0.4227 -------------+------------------------------ Adj R-squared = 0.4061 Total | 917.69213 107 8.57656196 Root MSE = 2.2569 ------------------------------------------------------------------------------ distance | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .4795455 .1521635 3.15 0.002 .1777996 .7812913 gender | -1.032102 2.218797 -0.47 0.643 -5.43206 3.367855 inter | .3048295 .1976661 1.54 0.126 -.0871498 .6968089 _cons | 17.37273 1.708031 10.17 0.000 13.98564 20.75982 ------------------------------------------------------------------------------

  34. Dental Data: Correcting for Correlation After OLS . xtgee distance age gender inter, i(child) cor(ind) ro GEE population-averaged model Number of obs = 108 Group variable: child Number of groups = 27 Link: identity Obs per group: min = 4 Family: Gaussian avg = 4.0 Correlation: independent max = 4 Wald chi2(3) = 148.77 Scale parameter: 4.905158 Prob > chi2 = 0.0000 Pearson chi2(108): 529.76 Deviance = 529.76 Dispersion (Pearson): 4.905158 Dispersion = 4.905158 (standard errors adjusted for clustering on child) ------------------------------------------------------------------------------ | Semi-robust distance | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .4795455 .0643352 7.45 0.000 .3534507 .6056402 gender | -1.032102 1.404031 -0.74 0.462 -3.783952 1.719748 inter | .3048295 .1190935 2.56 0.010 .0714105 .5382486 _cons | 17.37273 .739021 23.51 0.000 15.92427 18.82118 ------------------------------------------------------------------------------

  35. Dental: Resulting Fit

More Related