Download
pm 515 behavioral epidemiology generalized linear regression analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis PowerPoint Presentation
Download Presentation
PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis

PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis

169 Views Download Presentation
Download Presentation

PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. PM 515Behavioral EpidemiologyGeneralized Linear Regression Analysis Ping Sun, Ph.D. Jennifer Unger, Ph.D.

  2. Topics • Review • Probit Regression • Introduction to Logistic Analysis • Logistic analysis with data from case-control study • Graphical Presentation of Logistical Regression Results • Polychotomous logistic regression • Unconditional and conditional maximum likelihood method in Logistic regression • Survival analysis • Multi-level random coefficients modeling with binary outcome • Multi-level random coefficients modeling with Count Outcome

  3. ReviewWhat is a linear relationship? • Need to satisfy two requirements: • 1) If y=f(x) then c * y = f(c*x), c is a constant • If y1=f(x1), and y2=f(x2) then • y1 + y2 = f(x1) + f(x2) • Example: fat intake amount and caloric intake from fat

  4. Review The formula for linear regression Y = α + β x If we define y’ = Y – α, then Y’ = β * x It is obvious that 1) c * Y’ = c * β * x 2) if Y’1 = β * x1, and Y’2 = β * x2 , then Y’1 + Y’2 = β * x1 + β * x2 ; Y’ and x is linear, while Y and x in a regression formula is affine linear

  5. Review The Regular Linear Regression (Y = α + β x + ζ) is in fact not strictly model a linear association. But it is a ‘pure’ linear relationship after a simple conversion to Y Y1 = Y – mean(Y)

  6. ReviewLinear regression analysis Y = α + β x + ζ

  7. ReviewLinear straight line regression analysis Y = α + β x + ζ Assumptions: ζ independent and identically distributed (i.i.d) ζ complies with Gaussian Distribution Simply put: it assumes that ζ complies with normal distribution, thus could theoretically range from -∞ to +∞.

  8. Review: Another Linear Regression with a little conversion: Log Linear regression analysis • Log(Y) = α + β1 x1 + β2 x2 + ζ • Y = exp(α + β1 x1 + β2 x2 + ζ)=exp(α)*exp(β1 x1)*exp(β2 x2)*exp(ζ) • Two Major differences with previous linear model: • Y is proportional to exponential functions of x1 and x2, instead of just x1 and x2 • The contribution from x1 and x2 to Y are multiplicative, not additive

  9. ReviewLinear straight line regression analysis Y = α + β x + ζ ζ complies with Gaussian Distribution What if Y is dichotomous (Binary)? Violation of the assumptions!

  10. ReviewLinear straight line regression analysis Y = α + β x + ζ Now Y is dichotomous (Binary, ζ is certainly limited) How to deal with this kind of violation to the assumptions to the basic regressions?

  11. Method #1Just Treat it as a continuous outcome • If mean of the binary outcome is not at extreme value (too close to 0 or 1) • In large scale preliminary exploratory analysis

  12. Prevalence of Daily Smoking Among Chinese Youth and Mid-aged Adultsby gender and age groups, CSCS pilot survey conducted in year 2002

  13. ReviewLinear straight line regression analysis Y = α + β x + ζ ζ complies with Gaussian Distribution What if Y is dichotomous (Binary)? How to deal with this kind of violation to the assumptions to the basic regressions? Conduct a Transformation to Make It Linear !

  14. Generalized Linear RegressionProbit Conversion to Y • Can we somehow convert the binary indicator of Y to another variable, and then conduct the linear regression analysis? • Answer: Yes and No • Yes: conceptually • No: in algorithm

  15. Basic Requirement for the Candidate Transformers Y = α + β x + ζ η = a + bx + ζ Y ---> η Where Y is dichotomous with possible value of 1 or 0, η is the transformed Y • η need to be a monotonous function of Y. The higher p(Y=1) is , the larger the value of η • η need to have a possible span of (-∞, +∞). • presumably η = -∞ when p(y=1)=0, • η = +∞ when p(y=1)=1.

  16. Method 1: Probit Regression Y=1 Y=0 Normal Distribution η The binary variable Y is a ‘manifestation’ of another variable η η can be measurable, can also be latent and not applicable for direct measurement Examples: Y=Obese, η=age and gender adjusted BMI; Y=CVD, η=disease process

  17. Method 1: Probit Regression Y=1 Y=0 Y=0.5 Normal Distribution η The binary variable Y is a ‘manifestation’ of another variable η η can be measurable, can also be latent and not applicable for direct measurement Examples: Y=Obese, η=age and gender adjusted BMI; Y=CVD, η=disease process Mean=0, variance=1

  18. Method 1: Probit Regression η

  19. Method 1: Probit Regression P(Y=1) = α1 + β1 x (Y is binary) η η= α2 + β2 x( η is continuous ) With α2 and β2 estimated, a η can then be computed for each value of x Based on the estimation of η, a probability (p) can then be computed (via z score inversion) for each value of x.

  20. Y=1 Y=0 η Method 1: Probit Regression Pr(Y=1) = α1 + β1 x (Y is binary) η= α2 + β2 x( η is continuous ) Results Interpretation How to compare the estimation of two values of x (x1 vs. x2)? X1  η1  P1 X2  η2  P2 With two z score table check-ups, we can then compare P1 and P2.

  21. Method 1: Probit RegressionExample Proc probit data=CSCS.youth; class B_monthy_smoking ; model B_monthly_smoking = allowance (other covariates) Run;

  22. Method 1: Probit RegressionExample SAS output for male students: Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 -1.4661 0.0490 -1.5620 -1.3701 896.63 <.0001 allowance 1 0.0056 0.0011 0.0035 0.0077 27.28 <.0001 perceived_smoking 1 0.1674 0.0276 0.1134 0.2215 36.88 <.0001 friends_smoking 1 0.5015 0.0311 0.4405 0.5625 259.40 <.0001 Other covariates ….. ….. Or say: η (monthly_smoking) = -1.4661 + 0.0056 * allowance + …

  23. Method 1: Probit RegressionExample: Smoking and allowance in Male Chinese Youth η(smoking) = -1.4661 + 0.0056 * allowance + … If covariates were set to mean=0 before the analysis, for the average Youth in the sample, we calculated that When allowance=20 yuan/wk: η(smoking)=-1.35 -- p(smoking)=0.0879 When allowance=60 yuan/wk: η(smoking)=-1.13 -- p(smoking)=0.1292

  24. Method 1: Probit RegressionExample: Smoking and allowance in Chinese Youth P=0.24

  25. Method 1: Probit Regression Example When allowance=20,  p(smoking)=0.09 When allowance=60,  p(smoking)=0.13 Likely presentation in the paper: Allowance was significantly positively related to monthly smoking in Chinese male adolescents (p<0.0001). For the adolescents who received weekly allowance of 20 yuan, 9% of them smoked during the last 30 days before the survey, for those who received weekly allowance of 60 yuan, 13% of them smoked during the last 30 days before the survey.

  26. Is there any other way that will make it a little easier to interpret and perceive the results?

  27. 0 1 -∞ +∞ Method 2: Logistic Regression P Probability: 0 +∞ Odds: P/(1-p) Log of Odds: Log(P/(1-p))

  28. Method 2: Logistic Regression η = Log(P/(1-p)) Exp(η)=p/(1-p) (1-p) Exp(η) =p Exp(η) = p+ P*Exp(η) P= Exp(η) /(1+ Exp(η) ) = 1/(1+ Exp(-η)) Or P = logistic (η) = 1/(1+e-η)

  29. Method 2: Logistic Regression η Mean=0, variance=Л2/3=3.29

  30. Similarities and differences between the Logit function for logistic regression and Gaussian probability function for Probit regression

  31. Method 2: Logistic Regression P(Y=1) = α1 + β1 x (Y is binary) η Logit(p) = log[(p/(1-p)]=η= α2+β2 x ( η is continuous ) With α2 and β2 estimated, logit(p) can then be computed for each value of x Based on the estimation of η, a logit(p) can then be computed for each value of x.

  32. Method 2: Logistic Regression Logit(p)=log(p/(1-p)) = η= α2+β2 x • With Logit(p) calculated, what can be inferred from the results? • log(p/(1-p)) = α2 + β2 x • p/(1-p) = exp (α2 + β2 x ) •  The Odds (p/q) can be calculated for • each value of x

  33. Method 2: Logistic Regression Odds = p/(1-p) = exp (α2 + β2 x ) • Odds1 = p1/(1-p1) = exp (α2 + β2 x1 ) • Odds2 = p2/(1-p2) = exp (α2 + β2 x2 ) • Odds Ratio = Odds1 / Odds 2 • = exp(α2+β2 x1) / exp(α2+β2 x2) • = exp (β2 x1) / exp (β2 x2) • = exp (β2 (x1-x2)) • OR can be readily calculated for each two values of x, and it is only a function of x1-x2, or say, the change in x (Δx). Thus, we only need to say something like: for an increase of 1 unit in X, the OR is …

  34. Question 1 If the OR for the onset of smoking is 2 for one year increase in age, what will the OR be for 3 yrs increase in age? OR (Δx=1) = exp ( βΔx) = exp (β) = 2 OR (Δx=3) = exp ( βΔx) = exp (3β) = (exp (β)) 3= 2 3 = 8 Answer: For each 3 yrs older in age, the OR for the onset of smoking will be 8. Remember: logit(p) is a linear function of x, but ODDS [p/(1-p)], or Odds ratio [p1/(1-p1)] / [p2/(1-p2)] is NOT a linear function of x !!!

  35. Case Control studya special case of logistic regression OR (male vs. female)= Odds(male) / Odds(female) = 2/3

  36. Case Control studya special case of logistic regression Data t; x=0; y=0; weight=100; output; x=1; y=0; weight=200; output; x=0; y=1; weight=300; output; x=1; y=1; weight=400; output; Run; Proc logistic descending data=t; model y = x; freq weight; Run; Output Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 1.0986 0.1155 90.5160 <.0001 x 1 -0.4054 0.1443 7.8899 0.0050 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits x 0.667 0.502 0.885

  37. Logistic Regression Example Proc logistic descending data=CSCS.youth; model B_monthly_smoking = allowance (other covariates) Run;

  38. Logistic Regression ExampleSmoking and allowance in Male Chinese Youth SAS Output The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.6092 0.0939 772.9237 <.0001 allowance 1 0.0103 0.00195 27.7795 <.0001 perceived_smoking 1 0.2976 0.0495 36.1555 <.0001 friends_smoking 1 0.9048 0.0579 244.0508 <.0001 ---- for other covariates ----

  39. Logistic Regression ExampleSmoking and allowance in Male Chinese Youth logit(smoking) = -2.61 + 0.01 * allowance + … Whether the covariates were set to mean 0 or not, we can always calculate that: Log (OR (allowance=60 vs. allowance=20)) = (60-20) * 0.01 = 0.4 OR (allowance=60 vs. allowance=20) = exp (0.4) = 1.49

  40. Logistic Regression ExampleSmoking and allowance in Male Chinese Youth logit(smoking) = -2.61 + 0.01 * allowance + … OR (allowance=60 vs. allowance=20) = 1.49 Likely wording in a paper: Allowance to Chinese boy was found to be positively related to cigarette smoking in the last 30 days (p<0.0001). The Odds for monthly smoking was 1.49 (95% CI: 1.28-1.74) higher for each 40 yuan more in weekly allowance to the boys.

  41. Question 2 How was the 95% CI calculated? Show it in Excel: beta±se for 1 yuan is 0.0103±0.00195 Question: Somebody reported in his manuscript an OR and its 95%CI as OR (95%CI) = 2 (1-3). Is there anything wrong with the numbers?

  42. Logistic vs. Probit Regressionsto calculate OR from probit outputs Probit: P (allowance=60)= 0.13 Odds (allowance=60) = p/q = 0.13/(1-0.13) = 0.15 P(allowance=20)= 0.09 Odds (allowance=20) = p/q = 0.09/(1-0.09) = 0.10 OR (allowance=60 vs. allowance=20) = Odds(allowance=60)/Odds (allowance=20) = 0.15 / 0.10 = 1.50 p<0.0001 Logistic OR= 1.49 P<0.0001

  43. Logistic vs. Probit Regressionsto estimate the percentage from Logistic outputs Logistic logit(smoking) = -2.61 + 0.01 * allowance + … OR (allowance=60 vs. allowance=20) = 1.49 Lab Exercise 1: Calculate the percentages for when allowance=20, 60. Then to compare the percentages with the results from the Probit analysis.

  44. Graphical Presentation of Results from Logistic Regression P<0.0001 P=0.17 Log(Odds) is a linear function of X

  45. Graphical Presentation of Results from Logistic Regression Odds is no longer a linear function of X

  46. Graphical Presentation of Results from Logistic RegressionOR of monthly smoking among boys (compared with when allowance =20) When talking about an OR, remember that it is a comparison of two Odds

  47. Graphical Presentation of Results from Logistic RegressionOR of monthly smoking among boys (compared with when allowance =60)

  48. Graphical Presentation of Results from Logistic Regression: SAS StatementsConvert a continuous X to a categorical one data d2; set d1_out; allowance_1=0; allowance_2=0; allowance_3=0; allowance_4=0; allowance_5=0; if 0.0 <=allowance<= 2.5 then allowance_1=1 ; else if 7.5 <=allowance<= 7.5 then allowance_2=2 ; else if 15.0 <=allowance<= 15.0 then allowance_3=3 ; else if 25.0 <=allowance<= 35.0 then allowance_4=4 ; else if 45.0 <=allowance then allowance_5=1 ; if male=1; run; proc logistic descending data=d2 ; model monthcig1 = allowance_2 allowance_3 allowance_4 allowance_5 other covariates ; run;

  49. To Continue from here

  50. Graphical Presentation of Results from Logistic Regression: Output The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.7529 0.1337 424.0091 <.0001 allowance_2 1 0.0258 0.0898 0.0827 0.7736 allowance_3 1 0.1780 0.0543 10.7411 0.0010 allowance_4 1 0.1254 0.0398 9.9170 0.0016 ALLOWANCE_5 1 0.8207 0.1623 25.5796 <.0001 Other terms for the covariates