1 / 20

Sociology 601 Class 25: November 24, 2009

Sociology 601 Class 25: November 24, 2009. Homework 9 Review dummy variable example from ASR (finish) regression results for dummy variables Quadratic effects example: earnings and age plotting F-tests comparing models Example from Sociology of Religion.

verrill
Download Presentation

Sociology 601 Class 25: November 24, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sociology 601 Class 25: November 24, 2009 • Homework 9 • Review • dummy variable example from ASR (finish) • regression results for dummy variables • Quadratic effects • example: earnings and age • plotting • F-tests comparing models • Example from Sociology of Religion

  2. Review: Regression with Dummy Variables Create dummy variables for age: why? age is an interval variable, what advantage is there to creating a series of dummies? gen byte age25=0 if age<. /* new variable, age25, will be missing if age is missing */ replace age25=1 if age>=25 & age<=29 gen byte age30=0 if age<. replace age30=1 if age>=30 & age<=34 gen byte age35=0 if age<. replace age35=1 if age>=35 & age<=39 gen byte age40=0 if age<. replace age40=1 if age>=40 & age<=44 gen byte age45=0 if age<. replace age45=1 if age>=45 & age<=49 gen byte age50=0 if age<. replace age50=1 if age>=50 & age<=55 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age25-age50) tab agecheck, missing

  3. Stata Shortcut for Dummy Variables gen byte agecat= floor(age/5)*5 tab agecat, gen(age) * floor function deletes decimal places: * e.g., at age 23: floor(23/5)*5 = floor(4.6)*5 = 4*5 = 20 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age1-age6) tab agecheck, missing drop if age<25 | age>54

  4. Regression with Age Dummy Variables . regress conrinc age2-age6 if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 5, 719) = 12.79 Model | 3.8044e+10 5 7.6089e+09 Prob > F = 0.0000 Residual | 4.2773e+11 719 594895739 R-squared = 0.0817 -------------+------------------------------ Adj R-squared = 0.0753 Total | 4.6577e+11 724 643334846 Root MSE = 24390 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age2 | 8220.236 3143.413 2.62 0.009 2048.872 14391.6 age3 | 16495.6 3122.571 5.28 0.000 10365.16 22626.05 age4 | 17274.8 3112.55 5.55 0.000 11164.03 23385.57 age5 | 21532.53 3288.812 6.55 0.000 15075.7 27989.35 age6 | 20013.57 3406.607 5.87 0.000 13325.48 26701.66 _cons | 26954.2 2325.541 11.59 0.000 22388.54 31519.86 ------------------------------------------------------------------------------ Same R-squared and overall F, but different b’s and t’s (although same relative order): . regress conrinc age1-age5 if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 5, 719) = 12.79 Model | 3.8044e+10 5 7.6089e+09 Prob > F = 0.0000 Residual | 4.2773e+11 719 594895739 R-squared = 0.0817 -------------+------------------------------ Adj R-squared = 0.0753 Total | 4.6577e+11 724 643334846 Root MSE = 24390 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age1 | -20013.57 3406.607 -5.87 0.000 -26701.66 -13325.48 age2 | -11793.33 3266.455 -3.61 0.000 -18206.26 -5380.405 age3 | -3517.968 3246.403 -1.08 0.279 -9891.531 2855.595 age4 | -2738.771 3236.766 -0.85 0.398 -9093.413 3615.872 age5 | 1518.956 3406.607 0.45 0.656 -5169.13 8207.043 _cons | 46967.77 2489.343 18.87 0.000 42080.52 51855.02 ------------------------------------------------------------------------------

  5. Plot Earnings by Age . tab age, sum(conrinc) | Summary of respondent income in age of | constant dollars respondent | Mean Std. Dev. Freq. ------------+------------------------------------ 25 | 16277.936 10757.323 47 26 | 22712.5 12540.689 46 27 | 21188.725 11802.539 40 28 | 25593.444 18395.24 54 29 | 27021.244 17314.169 45 30 | 29687.902 16242.466 61 31 | 30723.709 21631.857 55 32 | 30218.871 19739.067 62 33 | 26096.263 15751.154 57 34 | 30685.51 20528 51 35 | 37709.106 26704.259 47 36 | 29178.255 21877.287 51 37 | 33702.843 20378.26 70 38 | 39046.871 30994.531 62 39 | 40338.326 29449.024 43 40 | 35442.909 23448.711 55 41 | 38218.979 31804.641 48 42 | 34377.678 26582.113 59 43 | 37867.069 25189.647 58 44 | 34885.268 23017.34 41 45 | 35212.378 20559.449 45 46 | 41641.308 28233.297 39 47 | 39708.14 29503.584 50 48 | 41391.807 26493.252 57 49 | 38324.964 23601.741 55 50 | 42443.892 29193.688 37 51 | 37255.357 25395.935 42 52 | 35165.655 20471.181 29 53 | 44005.892 30812.439 37 54 | 36918.065 26556.129 31 ------------+------------------------------------ Total | 33571.775 24047.119 1474

  6. Regression Test for Curvilinearity • test whether x has a curvilinear relationship with y: • testing for a quadratic relationship is the most common, but not the only method of testing for curvilinearity. • yi = β0 + β1xi + β2xi2 + ei • test whether β2 ≠ 0 • if β2 > 0, then U-shape curve (or part) • if β2 < 0, then inverted-U curve (or part) • if β2 !> 0 & β2 !< 0, then revert to linear equation by dropping x2 • β1 is rather irrelevant in this test • if p(β2 ≠ 0)>.05 and p(β1 ≠ 0)>.05, that does not mean there is no linear relationship.

  7. Curvilinear Regression Equation: β2 yi = β0 + β1xi + β2xi2 + ei β2 (quadratic coefficient) determines how steeply the curve accelerates: y = 2x2 ; y = x2 ; y = .5 x2

  8. Curvilinear Regression Equation: β2< 0 yi = β0 + β1xi + β2xi2 + ei β2 (quadratic coefficient) < 0 then curve is inverted-U y = -2x2 ; y = -x2 ; y = -.5 x2

  9. Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum yi = β0 + β1xi + β2xi2 + ei inflexion point = value of x when y is a maximum or minimum = - β1 / 2β2 y = -20x2 + 800x + 62000 inflexion= -800 / (-20 * 2) = 20 (i.e., below observed x values) y = -100x2 + 8000x – 90000 inflexion = -8000 / (-100 *2) = 40 (i.e., within the x range) y = -20x2 + 2400x + 800 inflexion = -2400 / (-20 * 2) = 60 (i.e., above observed values)

  10. Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum yi = β0 + β1xi + β2xi2 + ei for completeness, when β2 is positive: inflexion point = value of x when y is a maximum or minimum = - β1 / 2β y = 20x2 - 800x + 50000 inflexion= --800 / (20 * 2) = 20 (i.e., below observed x values) y = 100x2 - 8000x + 205000 inflexion = -8000 / (-100 *2) = 40 (i.e., within the x range) y = 20x2 - 2400x + 114000 inflexion = -2400 / (-20 * 2) = 60 (i.e., above observed values)

  11. Example: Regression with Curvilinear Age . gen int agesq=age*age . summarize age agesq Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 1860 38.84355 8.309941 25 54 agesq | 1860 1577.839 655.309 625 2916 . regress conrinc age agesq if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 2, 722) = 32.08 Model | 3.8016e+10 2 1.9008e+10 Prob > F = 0.0000 Residual | 4.2776e+11 722 592463841 R-squared = 0.0816 -------------+------------------------------ Adj R-squared = 0.0791 Total | 4.6577e+11 724 643334846 Root MSE = 24341 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 4764.733 1134.778 4.20 0.000 2536.875 6992.591 agesq | -50.27083 14.30126 -3.52 0.000 -78.34785 -22.19381 _cons | -65221.92 21786.08 -2.99 0.003 -107993.6 -22450.29 ------------------------------------------------------------------------------ tagesq = -3.52; p < .001, so: curvilinear; bagesq = negative, so: inverted U; inflexion point = -bage / (2 * bagesq)) = - 4764.7 / (2 * -50.27) = 47.4 so maximum earnings at age 47 and a half.

  12. Cubic Polynomials • Occasionally (actually, rarely), it is worthwhile to investigate whether a more complex polynomial would better describe the curvilinear relationship. • Add a cubic term (x3) to the previous quadratic equation: • yi = β0 + β1xi + β2xi2 + β3xi3 + ei • Test β3 ≠ 0 • if you can’t show β3 ≠ 0, then revert to quadratic model • if p(β3 ≠ 0) > .05, then don’t interpret β2 and β1 • if β3 ≠ 0, then curve has at least two bends (although not necessarily over the range of observed x’s)

  13. Cubic Polynomials: Earnings and Age Example • . regress conrinc age agesq agecu if sex==1 • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 3, 721) = 21.36 • Model | 3.8020e+10 3 1.2673e+10 Prob > F = 0.0000 • Residual | 4.2775e+11 721 593278929 R-squared = 0.0816 • -------------+------------------------------ Adj R-squared = 0.0778 • Total | 4.6577e+11 724 643334846 Root MSE = 24357 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • age | 3971.837 8901.06 0.45 0.656 -13503.26 21446.93 • agesq | -29.64795 230.0667 -0.13 0.897 -481.3286 422.0327 • agecu | -.1739568 1.936886 -0.09 0.928 -3.976566 3.628653 • _cons | -55354.68 112007 -0.49 0.621 -275253.4 164544.1 • ------------------------------------------------------------------------------ • Note: after age cubed in entered, none of the coefficients are statistically significant (even though age and age squared were in the quadratic model). • So, since βagecubed is not statistically significant, revert to the quadratic model (DON’T conclude that age has no relationship with earnings!)

  14. Cubic Polynomials: Actual Results

  15. Inferences: F-tests Comparing models Comparing Regression Models, Agresti & Finlay, p 409: Where: Rc2 = R-square for complete model, R r2 = R-square for reduced model, k = number of explanatory variables in complete model, g = number of explanatory variables in reduced model, and N = number of cases.

  16. Example: F-tests Comparing models • Complete model: men’s earnings on • age, • age square, • age cubed, • education, and • currently married dummy. • Reduced model: men’s earnings on • education and • currently married dummy. • F-test comparing model is whether age variables, as a group, have a significant relationship with earnings after controls for education and marital status

  17. Example: F-tests Comparing models • Complete model: men’s earnings • . regress conrinc age agesq agecu educ married if sex==1 • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 5, 719) = 45.08 • Model | 1.1116e+11 5 2.2233e+10 Prob > F = 0.0000 • Residual | 3.5461e+11 719 493199914 R-squared = 0.2387 • -------------+------------------------------ Adj R-squared = 0.2334 • Total | 4.6577e+11 724 643334846 Root MSE = 22208 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • age | 5627.049 8127.377 0.69 0.489 -10329.18 21583.27 • agesq | -75.30909 210.0421 -0.36 0.720 -487.6781 337.0599 • agecu | .1985975 1.768176 0.11 0.911 -3.272807 3.670003 • educ | 3555.331 317.9738 11.18 0.000 2931.063 4179.599 • married | 8664.627 1690.098 5.13 0.000 5346.51 11982.74 • _cons | -127148.4 102508.3 -1.24 0.215 -328399.8 74103.01 • ------------------------------------------------------------------------------ • Note: none of the three age coefficients are, by themselves, statistically significant. • Rc2 = .2387; k = 5.

  18. Example: F-tests Comparing models • Reduced model: men’s earnings • . regress conrinc educ married if sex==1 • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 2, 722) = 80.20 • Model | 8.4666e+10 2 4.2333e+10 Prob > F = 0.0000 • Residual | 3.8111e+11 722 527850916 R-squared = 0.1818 • -------------+------------------------------ Adj R-squared = 0.1795 • Total | 4.6577e+11 724 643334846 Root MSE = 22975 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • educ | 3650.611 328.1065 11.13 0.000 3006.454 4294.767 • married | 10721.42 1716.517 6.25 0.000 7351.457 14091.38 • _cons | -16381.3 4796.807 -3.42 0.001 -25798.65 -6963.944 • ------------------------------------------------------------------------------ • Rr2 = .1818; g = 2.

  19. Inferences: F-tests Comparing models F = ( 0.2387 – 0.1818) / (5 – 2) df1=5-2; df1=725-6 ( 1 - .2387) / (725 – 6) = 0.0569/3 0.7613/719 = 26.87, df=(3,719), p < .001 (Agresti & Finlay, table D, page 673)

  20. Next: Regression with Interaction Effects • Examples with earnings: • married x gender • age x gender • age x education • marital status x gender

More Related