1 / 43

Contents:

Analysis of ordinal repeated categorical response data by using marginal model (Maximum likelihood approach) by Abdul Salam Instructor: K.C. Carriere Stat 562. Contents:. Introduction Background of data Objective of the study Basic theory Marginal model Model fitting using ML SAS Codes

Download Presentation

Contents:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of ordinal repeated categorical response data by using marginal model (Maximum likelihood approach) by Abdul Salam Instructor: K.C. Carriere Stat 562

  2. Contents: • Introduction • Background of data • Objective of the study • Basic theory • Marginal model • Model fitting using ML • SAS Codes • Results • Conclusion

  3. Introduction • Definition • Categorical data • Repeated categorical data • Advantages and Disadvantages of repeated Measurements Designs

  4. Definition • Categorical data • Categorical data fits into a small number of discrete categories (as opposed to continuous). Categorical data is either non-ordered (nominal) such as gender or city, or ordered (ordinal) such as high, medium, or low temperatures.

  5. Definition (cont-) • Repeated categorical data • The term “repeated measurements” refers broadly to data in which the response of each experimental unit or subject is observed on multiple occasions or under multiple conditions. When the response is categorical then it is called repeated categorical data.

  6. Definition (cont-) • Application of Repeated categorical data • Repeated categorical response data occur commonly in health-related application, especially in longitudinal studies. For example, a physician might evaluate patients at weekly intervals regarding whether a new drug treatment is successful. In some cases explanatory variable also vary over time.

  7. Advantages of Repeated Measurements Designs • Individual patterns of change. • Provide more efficient estimates of relevant parameters than cross-sectional designs with the same number and pattern of measurement. • Between subjects sources of variability can be excluded from the experimental error.

  8. Disadvantages of Repeated Measurements Designs • Analysis of repeated data is complicated by the dependence among the repeated observations made on the same experimental unit. • Often investigator cannot control the circumstances for obtaining measurements, so that the data may be unbalanced or partially incomplete.

  9. Background of Insomnia data • A randomized, double blind clinical trail has been performed for comparing an active hypnotic drug with a placebo in patients who have insomnia problems. The outcome variable which is patient’s response to the question, How quickly did you fall asleep after going to bed?” measured using categories (<20 minutes, 20-30 minutes, 30-60 minutes, and >60 minutes). Patients were asked this question before and following a two-week treatment period.

  10. Background of Insomnia data • Patients were randomly assigned to one of the two treatments active and placebo. The two treatments, active and placebo, form a binary explanatory variable. Patients receiving the two treatments were independent samples.

  11. Table#1: Time to falling Asleep, by Treatment and Occasion.(n=239).

  12. Objectives • To study the effect of time on the response. • To study the effect of treatment on the response. Is the time to fall asleep is quicker for active treatment than placebo? • Is there any interaction between treatment and time? How does the treatment affect the time to fall asleep over time?

  13. Pharmaceutical Company Interest Company hope that patients with a Active treatment have a significantly higher rate of improvement than patients with placebo.

  14. Generalized linear model to the analysis of Repeated Measurements Designs • Marginal Models; • Random Effect Models; • Transition models.

  15. Basic Theory

  16. GLMs for ordinal response. • Extensions of generalized linear model methodology for the analysis of repeated measurements accommodate discrete or continuous, time-independent or dependent covariates. GLMs have three components: A random component, which identify the response variable Y and its probability distribution; a systematic component specify explanatory variables used in a linear predictor function; a link function specifies the functional relationship between the systematic component and the E(Y)..

  17. Random Component. • Since the response is ordinal, so it is often advantageous to construct logits that account for categorical ordering and are less affected by the number of choice of categories of the response, which is known as cumulative response probabilities, from which the cumulative logits are defined. For ordinal response with c + 1 ordered categories labeled as 0,1, 2,…….,C for each individuals or experimental unit. The cumulative response probabilities are j = 0,1,…….c Thus

  18. Systematic component. • The systematic component of the generalized linear model specifies the explanatory variables. The linear combination of these explanatory variables is called the linear predictor denoted by The vector β characterizes how the cross-sectional response distribution depends on the explanatory variables.

  19. Link Function. • The link function explain the relation ship between random and systematic component, that how relates to the explanatory variables in the linear predictor. For ordinal response having c+1 categories, one might use the cumulative logit. Logitj = logit [P(Y ≤ j)], j=1,…………..c

  20. Link Function. where GLM is simplified to proportional odds model, then βj may simplify to β indicating the same effect for each logit. The proportional odds model is for j =1,……….c,

  21. Link Function. For individuals with covariate vector x*and x, the odds ratio for the response below category j is The odds ratio does not depend on response category j. The regression coefficient can be calculated by taking log, which indicate the difference in logit (log odds) of response variable per unit change in the x.

  22. Maximum Likelihood Method (ML). • The standard approach to maximum likelihood (ML) fitting of marginal models involves solving the score equations using the Newton-Raphson method, Fisher scoring, or some other iterative reweighted least squares algorithm. ML fitting of marginal logit models is awkward. For T observations on an I-category response, at each setting of predictors the likelihood refers to IT ­ multinomial joint probabilities, but the model applies to T sets of marginal multinomial parameters, and assume that marginal multinomial variates are independent.

  23. ML: Model Speciofication. • Let consider T categorical responses, where the tth variable has It categories. The responses are ordinal observed for P covariate patterns, defined by a set of explanatory variables. Let r = denote the number of response profiles for each covariate pattern. The vector of counts for covariate pattern p is denoted by Yp. The Yp are assumed to be independent multinomial random vectors,

  24. ML: Model Speciofication. • Where is a vector of positive probabilities and 1rT is a r-dimensional vector of 1’s. Since the model applies to T sets of marginal multinomial parameters, the marginal models can be written as a generalized linear model with the link function,

  25. ML Fitting of marginal Models: Lang and Agresti (1994) considered the likelihood as a function of rather then. The likelihood function for a marginal logit model is the product of the multinomial mass functions from the various predictors setting. One approach for ML fitting views the model as a set of constraints and uses methods for maximizing a function subject to constraints

  26. ML Fitting of marginal Models: Let be a vector having elements and the lagrange multipliers . The Lagrangian likelihood equations have form where is a vector with terms involving the contents in marginal logits that the model specifies constraints as well as log-likelihood derivative. The Newton-Raphson iterative scheme is

  27. ML Fitting of marginal Models: After obtaining the fitted values on convergence of the algorithm, they calculate model parameter estimates using This maximum likelihood fitting method makes no assumption about the model that describes the joint distribution. Thus, when the marginal model holds, the ML estimate are consistent regardless of the dependence structure for that distribution.

  28. Inference Hypothesis testing for parameters: • After obtaining model parameter estimates and estimated covariance matrix, one can apply standard methods of inference, for instance Wald chi-squared test for marginal homogeneity. Goodness of Fit test: • To assess model goodness of fit, one can compare observed and fitted cell counts using the likelihood-ratio statistics G2 or the Pearson Chi-square statistics. For nonsparse tables, assuming that the model holds, these statistics have approximate chi-squared distributions with degree of freedom equal to the number of constraints implied by

  29. Limitations of ML: • The number of multinomial probabilities increases dramatically as the number of predictors increases. • ML approaches are not practical when T is large or there are many predictors, especially when some are continuous. • It does not make any assumption about the model that describes the joint distribution .

  30. Results: Table#2:Sample Marginal Proportions for Insomnia Data.

  31. Figure# 1:Sample Marginal Proportions Insomnia data.

  32. Marginal Proportion • sample proportion of time to falling asleep in <20 minutes for subject who received Active treatment at initial occasion is = (7+4+1+0) / (7+4+1+0+11+…………+13+8) = 12/119=0.1008 • Similarly the sample proportion of time to falling asleep in >60 minutes for subject received placebo at follow up is = (1+0+2+22) / (7+4+2+1+………..+14+22) = 25/120=0.20833 And so on.

  33. What did you get from Marginal Proportion table? • From initial to follow up occasion, time to falling asleep seems to shift downward for both treatments. • The degree of shift seems greater for the active treatment than placebo, indicating possible interaction. Or we could say that effect of treatment on the response is different at different occasion.

  34. Fitted Marginal Model Let ‘x’ represent the treatment, with x=1 for an Active treatment and x=0 for the placebo. Let t denote the occasion measurement , with t=0 for initial and t=1 for follow up. Let (Yt) represent the outcome variable which is patient’s response at time t to the question, “How quickly did you fall asleep after going to bed?” with j=0 for <20 minutes, j=1 for 20-30 minutes, j=2 for 30-60 minutes, and j=3 for >60 minutes). The marginal model with cumulative link can be written for our data set as logit [P(Y ≤ j)] =

  35. SAS code data isomnia; input treatment $ initial $ follow $ count @@; If count=0 then count=1E-8; datalines; active <20 <20 7 active <20 20-30 4 active <20 30-60 1 active <20 >60 0 active 20-30 <20 11 active 20-30 20-30 5 active 20-30 30-60 2 active 20-30 >60 2 active 30-60 <20 13 active 30-60 20-30 23 active 30-60 30-60 3 active 30-60 >60 1 active >60 <20 9 active >60 20-30 17 active >60 30-60 13 active >60 >60 8 placbo <20 <20 7 placbo <20 20-30 4 placbo <20 30-60 2 placbo <20 >60 1 placbo 20-30 <20 14 placbo 20-30 20-30 5 placbo 20-30 30-60 1 placbo 20-30 >60 0 placbo 30-60 <20 6 placbo 30-60 20-30 9 placbo 30-60 30-60 18 placbo 30-60 >60 2 placbo >60 <20 4 placbo >60 20-30 11 placbo >60 30-60 14 placbo >60 >60 22 ;

  36. SAS code proccatmod order=data data=isomnia; weight count; population Treatment; response clogit; model initial*follow=(100111, α1+ β1+ β2 +β3 active + follow, j=1 010111, α 2+ β1+ β2 +β3active + follow, j=2 001111, α 3+ β1+ β2 +β3active + follow, j=3 100100, α 1+ β1 active+ initial, j=1 010100, α 2+ β1active+ initial , j=2 001100, α 3+ β1 active + initial, j=3 100010, α 1 + β2 placebo+ follow, j=1 010010, α 2 + β2placebo+ follow, j=2 001010, α 3 + β2 placebo+ follow, j=3 100000, α 1placebo+ initial, j=1 010000, α 2placebo+ initial, j=2 001000) α 3placebo+ initial, j=3 (123 ='Cutpoint', 4='Treatment', 5='TIme effect', 6='Time*Treatment effect') / freq; quit;

  37. Fitted Marginal Model After fitting the marginal model using maximum likelihood method to the above marginal distribution gave the following results Logit [P (Y≤ J)] = -1.16+ 0.10 +1.37+1.074 (Occasion) + 0.046 (Treatment) + 0.662 (Occasion * Treatment)

  38. Hypothesis testing for estimators: • For Occasion • β1= 1.074 S.E (β1)= 0.162 p-value=<0.0001 • For Treatment • β2= 0.046 S.E (β2)= 0.236 p-value= 0.84 • For interaction (Occasion * time) • β3= 0.662 S.E (β3)= 0.244 p-value= 0.00665

  39. Model Goodness of fit test The Likelihood ratio test (G2) has been used for Goodness of fit test. ML model fitting, comparing the observed to fitted cell counts in modeling the 12 marginal logits using these six parameters with df=6 gives G2 = 8.0 and p-value 0.238, indicating that the model fit the given data set well

  40. Interpretation of Parameters Effect of Treatment: (Active vs Placebo) • 1. At initial observation: • The estimated odds that the time to falling asleep for the active treatment is below any fixed equal Exp {0.046}=1.04 times the estimated odds for the placebo treatment. • 2. At Follow up observation: • The estimated odds that the time to falling asleep for the active treatment is below any fixed equal Exp{0.046+0.662} = 2.03 times the estimated odds for the placebo treatment.

  41. Interpretation of Parameters (cont.) • For the Active treatment the slope is β3= 0.662 (SE=0.244) higher than for the placebo, giving strong evidence of faster improvement. In other words, initially the two treatments had similar effect, but at the follow up those patients with the active treatment tended to fall asleep more quickly.

  42. Conclusion • Using the maximum likelihood methods for the marginal distribution for the above given Insomnia data set, we have sufficient evidence to conclude that treatment and time have substantial effects on the response (time to fall asleep).

  43. Thank You For Your Attention

More Related