1 / 32

Estimation of constant-CV regression models

Estimation of constant-CV regression models. Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL. y i = b 0 + b 1 x i + e i. Variance Models with Simple Linear Regression. y i = b 0 + b 1 x i + e i V ( e i ) = s 2.

kenna
Download Presentation

Estimation of constant-CV regression models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

  2. yi = b0 + b1xi + ei Variance Models with Simple Linear Regression yi = b0 + b1xi + ei V( ei ) = s2 yi = b0 + b1xi + ei V( ei ) = s2(b0 + b1xi)2 y = Xb + Zu

  3. yi = b0 + b1xi + ei V(ei) = s2(b0 + b1xi)2 Example: b0 = 1.0, b1= 0.5, s = 0.10 .clear . set obs 100 . gen x=10*uniform() . gen mu = 1+.5*x . replace y=mu+.10*mu*invnorm(uniform()) Problem: Estimate b0, b1,and s.

  4. Variance Stabilization yi = b0 + b1xi + ei V(ei) = s2(b0 + b1xi)2 But E(log yi) = g(b0 ,b1, xi)

  5. Approximate g(b0 ,b1, xi) by polynomial in x, then do OLS regression of log y on x: . gen z = log(y) . gen x2 = x*x . reg z x x2 . predict z_hat b0 = 1.0, b1= 0.5, s = 0.10 Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 2, 97) = 630.65 Model | 15.9635043 2 7.98175216 Prob > F = 0.0000 Residual | 1.22767564 97 .01265645 R-squared = 0.9286 -------------+------------------------------ Adj R-squared = 0.9271 Total | 17.19118 99 .173648282 Root MSE = .1125 ------------------------------------------------------------------------------ z | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .2707448 .018873 14.35 0.000 .2332872 .3082025 x2 | -.0110946 .0017622 -6.30 0.000 -.0145921 -.0075972 _cons | .1783796 .0441957 4.04 0.000 .0906634 .2660958 ------------------------------------------------------------------------------

  6. But what about b0 and b1?

  7. Alternative: Iteratively re-weighted regression reg y x predict muh reg y x [w=1/muh^2] .local rmse=e(rmse) .gen wt = 1/muh^2 .summ wt .local wbar=r(mean) .local sigh = sqrt(`wbar’)*`rmse’

  8. ITERATION 0 . reg y x Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 832.95 Model | 172.307709 1 172.307709 Prob > F = 0.0000 Residual | 20.272763 98 .206864928 R-squared = 0.8947 -------------+------------------------------ Adj R-squared = 0.8937 Total | 192.580472 99 1.94525729 Root MSE = .45482 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .518296 .0179585 28.86 0.000 .482658 .5539339 _cons | .9530661 .1014154 9.40 0.000 .7518106 1.154322 ------------------------------------------------------------------------------ . gen wt = 1/(_b[_cons] + _b[x]*x)^2

  9. ITERATION 1 . reg y x [w=wt] (analytic weights assumed) (sum of wgt is 1.3046e+01) Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 1278.10 Model | 123.561274 1 123.561274 Prob > F = 0.0000 Residual | 9.47421712 98 .096675685 R-squared = 0.9288 -------------+------------------------------ Adj R-squared = 0.9281 Total | 133.035492 99 1.34379284 Root MSE = .31093 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .510555 .014281 35.75 0.000 .4822147 .5388952 _cons | .9877915 .0533903 18.50 0.000 .8818402 1.093743 ------------------------------------------------------------------------------ . replace wt = 1/(_b[_cons] + _b[x]*x)^2 (100 real changes made)

  10. ITERATION 2 . reg y x [w=wt] (analytic weights assumed) (sum of wgt is 1.2842e+01) Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 1271.77 Model | 124.885941 1 124.885941 Prob > F = 0.0000 Residual | 9.62343467 98 .098198313 R-squared = 0.9285 -------------+------------------------------ Adj R-squared = 0.9277 Total | 134.509376 99 1.35868057 Root MSE = .31337 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .510746 .0143219 35.66 0.000 .4823247 .5391674 _cons | .9870233 .0540361 18.27 0.000 .8797904 1.094256 ------------------------------------------------------------------------------ . replace wt = 1/(_b[_cons] + _b[x]*x)^2 (100 real changes made)

  11. ITERATION 3 . noi reg y x [w=wt] (analytic weights assumed) (sum of wgt is 1.2846e+01) Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 1271.92 Model | 124.855967 1 124.855967 Prob > F = 0.0000 Residual | 9.6200215 98 .098163485 R-squared = 0.9285 -------------+------------------------------ Adj R-squared = 0.9277 Total | 134.475988 99 1.35834331 Root MSE = .31331 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .5107417 .0143209 35.66 0.000 .4823223 .5391612 _cons | .9870406 .0540213 18.27 0.000 .8798371 1.094244 ------------------------------------------------------------------------------ . summ wt Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- wt | 100 .1284623 .1209472 .0269917 .5841107 . local wbar=r(mean) . noi di e(rmse)*sqrt(`wbar') .11229563 b0 = 1.0, b1= 0.5, s = 0.10

  12. Can we do this using -xtmixed- ? yi =b0+b1xi+s(b0+b1xi)ui = b0+b1xi+c0u0i+c1xiu1i where u0i = u1i and c1/c0 = b1/b0 . xtmixed y x ||???: x • How do we get –xtmixed- to estimate a non-constant residual variance? • Degenerate dependency of random effects (u0i = u1i). • Coefficients of random intercept and slope (c0 and c1) need to be constrained.

  13. Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 1: Ignore dependency of u’s and constraints on c’s: yi = b0 + b1xi + c0u0i + c1xiu1i set obs 1000 gen x = 5*uniform() gen mu = 3+1.4*x gen u0=invnorm(uniform()) gen u1=invnorm(uniform()) gen y = mu + 0.05*u0 + 0.50*x*u1 gen ord=_n xtmixed y x ||ord: x,noc

  14. . xtmixed y x ||ord: x,noc nolog Mixed-effects REML regression Number of obs = 1000 Group variable: ord Number of groups = 1000 Obs per group: min = 1 avg = 1.0 max = 1 Wald chi2(1) = 5745.72 Log restricted-likelihood = -1444.8061 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.404595 .0185301 75.80 0.000 1.368276 1.440913 _cons | 3.0182 .0116741 258.54 0.000 2.99532 3.041081 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ ord: Identity | sd(x) | .5058616 .0118386 .4831824 .5296053 -----------------------------+------------------------------------------------ sd(Residual) | .0495162 .0143015 .0281125 .0872159 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 718.85 Prob >= chibar2 = 0.0000 b0 = 3.0, b1= 1.4, c0 = 0.05, c1 = 0.50

  15. Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: yi = b0 + b1xi + c1ziu1i set obs 1000 gen x = 5*uniform() gen z = 3 + 1.4*x gen u1=invnorm(uniform()) gen y = 3 + 1.4*x + 0.50*z*u1 gen ord=_n xtmixed y x ||ord: z,noc

  16. Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: yi = b0 + b1xi + c1ziu1i xtmixed y x ||ord: z,noc Garbage! Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -2506.6255 numerical derivatives are approximate flat or discontinuous region encountered Iteration 1: log restricted-likelihood = -2503.1832 numerical derivatives are approximate

  17. Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: yi = b0 + b1xi + c1ziu1i expand 3 sort ord gen yf=y + .001*invnorm(uniform()) xtmixed yf x ||ord: z,noc nolog

  18. . xtmixed yf x ||ord: z,noc nolog Mixed-effects REML regression Number of obs = 3000 Group variable: ord Number of groups = 1000 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(1) = 484.13 Log restricted-likelihood = 7952.0717 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.393598 .063337 22.00 0.000 1.26946 1.517736 _cons | 3.071291 .1252851 24.51 0.000 2.825737 3.316846 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ ord: Identity | sd(z) | .4814862 .0107771 .46082 .5030792 -----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 31334.04 Prob >= chibar2 = 0.0000 b0 = 3.0, b1= 1.4, c1 = 0.50

  19. Can we do this using -xtmixed- ? Ex 3: No random intercept (unknown covariate) yi =b0+b1xi+s(b0+b1xi)ui = b0+b1xi+c0u0i+c1xiu1i where u0i = u1i and c1/c0 = b1/b0 • Degenerate dependency of random effects (u0i = u1i). • Coefficients of random intercept and slope (c0 and c1) need to be constrained.

  20. Can we do this using -xtmixed- ? Ex 3: No random intercept (unknown covariate) yi =b0+b1xi+s(b0+b1xi)ui = b0+b1xi+c0u0i+c1xiu1i where u0i = u1i and c1/c0 = b1/b0 • Degenerate dependency of random effects (u0i = u1i). • Coefficients of random intercept and slope (c0 and c1) need to be constrained. Recast model with one error term and pretend zi = b0+b1xiis known. Then iterate.

  21. Can we do this using -xtmixed- ? yi = b0 + b1xi + s(b0 + b1xi)ui = b0 + b1xi+ c1ziu1i • Expand and introduce artificial “residual” error term .expand 3 .gen yf=y + .001*invnorm(uniform())

  22. Can we do this using -xtmixed- ? yi = b0 + b1xi + s(b0 + b1xi)ui = b0 + b1xi+ c1ziu1i • Expand and introduce artificial “residual” error term. • 2. Estimate zi by OLS or other “easy” method. .expand 3 .gen yf=y + .001*invnorm(uniform()) .reg y x .predict zh

  23. Can we do this using -xtmixed- ? yi = b0 + b1xi + c1ziu1i [zi = b0 + b1xiis unknown] • Expand and introduce artificial “residual” error term. • 2. Estimate zi by OLS or other “easy” method. • 3. Fit model pretending prediction zhi is actual zi. .expand 3 .gen yf=y + .001*invnorm(uniform()) .reg y x .predict zh .xtmixed yf x ||ord: zh,noc nolog

  24. Can we do this using -xtmixed- ? yi = b0 + b1xi + c1ziu1i [zi = b0 + b1xiis unknown] • Expand and introduce artificial “residual” error term. • 2. Estimate zi by OLS or other “easy” method. • 3. Fit model pretending prediction zhi is actual zi. • 4. Iterate. .expand 3 .gen yf=y + .001*invnorm(uniform()) .reg y x .predict zh .xtmixed yf x ||ord: zh,noc nolog .drop zh .predict zh

  25. . xtmixed yf x ||ord: zh,noc Mixed-effects REML regression Number of obs = 3000 Group variable: ord Number of groups = 1000 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(1) = 484.01 Log restricted-likelihood = 7952.4049 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.39339 .0633354 22.00 0.000 1.269255 1.517525 _cons | 3.071699 .1261407 24.35 0.000 2.824467 3.31893 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ ord: Identity | sd(zh) | .4764546 .0106645 .4560044 .4978219 -----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 31334.71 Prob >= chibar2 = 0.0000 b0 = 3.0, b1= 1.4, c1 = 0.50

  26. 2-level model E(yij| xij , u1i ) E(yij| xij ) [“z”] yij = b0 + b1xij+ c1(b0 + b1xij)u1i + c2[b0 + b1xij+ c1(b0 + b1xij)u1i]u2i [“zi”] args NS nr be0 be1 c1 c2 drop _all set obs `NS' gen id=_n gen u1 = invnorm(uniform()) expand `nr' sort id gen u2=invnorm(uniform()) gen x = 10*uniform() gen z = `be0' + `be1'*x gen zi = z + `c1'*z*u1 gen y = zi + `c2'*zi*u2

  27. 2-level model (example)

  28. //[gen y = zi + `c2'*zi*e] gen obs=_n expand 3 sort obs gen yf = y + .001*invnorm(uniform()) xtmixed y x ||id: x,noc nolog predict zh0 predict uh1i_0,reffects level(id) gen zhi_0 = zh0 + uh1i_0 xtmixed yf x ||id: zh0,noc ||obs: zhi_0,noc nolog predict zh1 predict uh1i_1,reffects level(id) gen zhi_1 = zh1 + uh1i_1 xtmixed yf x ||id: zh1,noc ||obs: zhi_1,noc nolog predict zh2 predict zhi_2,reffects level(id) gen zhi_2 = zh2 + uh1i_2 noi xtmixed yf x ||id: zh2,noc ||obs: zhi_2,noc nolog 0 1 2 3

  29. . run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1 .21211063 .05062864 .21216237 .05076224 .21213417 .05075685 .21213363 .05075686 .21213354 .05075685 .21213353 .05075685 Mixed-effects REML regression Number of obs = 300 ----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+------------------------------------------ id | 20 15 15.0 15 obs | 100 3 3.0 3 -----------------------------------------------------------

  30. . run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1 Wald chi2(1) = 421.17 Log restricted-likelihood = 1038.0836 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons | 1.039358 .0535761 19.40 0.000 .9343512 1.144366 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | sd(zh) | .2121335 .0348399 .1537487 .2926895 -----------------------------+------------------------------------------------ obs: Identity | sd(zhi) | .0507568 .0040389 .0434271 .0593237 -----------------------------+------------------------------------------------ sd(Residual) | .0009429 .0000471 .0008549 .00104 ------------------------------------------------------------------------------ LR test vs. linear regression: chi2(2) = 2866.48 Prob > chi2 = 0.0000

  31. b1 b0 c2 c1 Bayesian Estimation (WINBUGS)

  32. WINBUGS node mean sd 2.5% median 97.5% start sample be1 1.077 0.04988 0.9847 1.076 1.169 10001 10000 be0 1.03 0 .05257 0.9317 1.03 1.131 10001 10000 c1 0.217 0.03455 0.1618 0.2127 0.2958 10001 10000 c2 0.064 0.00465 0.0556 0.0637 0.07353 10001 10000 STATA (xtmixed) yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons| 1.039358 .0535761 19.40 0.000 .9343512 1.144366 ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | sd(xb) | .2121335 .0348399 .1537487 .2926895 -----------------------------+------------------------------------------------ obs: Identity | sd(muhi) | .0507568 .0040389 .0434271 .0593237 -----------------------------+------------------------------------------------

More Related