Purpose. Introduce Ordinal Logistic Regression AnalysisDemonstrate the use of the proportional odds (PO) model using Stata (V. 9.0) Compare the results of the proportional odds model using both Stata OLOGIT and SAS LOGISTIC. . Why Ordinal Regression Analysis?. Ordinal Dependent VariableTeaching e
1. Ordinal Regression Analysis: Fitting the Proportional Odds Model Using Stata and SAS Xing Liu
Neag School of Education
University of Connecticut
2. Purpose Introduce Ordinal Logistic Regression Analysis
Demonstrate the use of the proportional odds (PO) model using Stata (V. 9.0)
Compare the results of the proportional odds model using both Stata OLOGIT and SAS LOGISTIC.
3. Why Ordinal Regression Analysis? Ordinal Dependent Variable
SES (high, middle, low)
Degree of Agreement
Ability level (e.g. literacy, reading)
Context is important
4. Why Using STATA and SAS? Both are powerful general statistics software packages
Stata are more powerful in the analysis of binary logistic regression and ordinal logistic regression
SAS has two options for Ordinal dependent variable
5. Proportional Odds Model (1) One of several possible regression models for the analysis of ordinal data, and also the most common.
Model predicts the ln(odds) of being in category j or beyond.
Effect of an IV assumed to be invariant across splits
6. Proportional Odds Model (2) Model predicts cumulative logits across K-1 response option categories.
For K=6, (here, Y = 0 to 5) these cumulative logits can be used to make predictions for the K-1=five cumulative probabilities, given the collection of explanatory variables:
Logit = ln(odds);
Probability = odds / (1 + odds)
logits ? odds ? estimated probability
7. A Latent-variable Model (1) It can be expressed as a latent variable model (Agresti, 2002; Greene, 2003; Long, 1997, Long & Freese, 2006; Powers & Xie, 2000; Wooldridge & Jeffrey, 2001)
Assuming a latent variable, Y* exists, we can define Y* = xß + e
Let Y* be divided by some cut points (thresholds): a1, a2, a3… aj, and a1<a2<a3…< aj.
The observed child’s literacy proficiency level is the ordinal outcome, y, ranging from 0 to 5
8. A Latent-variable Model (2) We define:
9. A Latent-variable Model (3) We can compute probability of a child attaining each proficiency level.
P(y=0) = P (y* =a1) = P(xß + e = a1) = F (a1- xß);
P(y=1) = P (a1<y* =a2) = F (a2- xß)- F (a1- xß);
P(y=4) = P (a4<y* =a5) = F (a5- xß)- F (a4- xß);
P(y=5) = P (y* >a5) = 1- F (a5- xß).
We can also compute the cumulative probabilities using the form: P(Y=j) = F (aj - xß)
10. General Logistic Regression Model In a binary logistic regression model, we predict the log(odds) of success on a set of predictors.
In Stata, the ordinal logistic regression model is expressed as:
SAS uses a different ordinal logit model for estimating the parameters
11. Methodology Sample: ECLS-K Longitudinal Study (NCES)
n= 3365 from 225 schools
Outcome variable: proficiency in early reading
Eight explanatory variables
The PO model with a single explanatory variable was fitted first
Full-model with all eight explanatory variables
The assumption of the PO models were tested
Software packages: STATA and SAS
13. Figure 1: Full-Model Analysis of Proportional Odds Using Stata
14. Figure 2: Brant Test of Parallel Regression (Proportional Odds) Assumption
15. Figure 3: Measure of Fit Statistics for Full-Model
16. Conclusions Both packages produce the same or similar results in model fit statistics and the test of the proportional odds assumption
Stata produces more detailed information of PO assumption test, and fit statistics
The estimated coefficients and cut points (thresholds) are the same in magnitude but may be reversed in sign
Compared to Stata, SAS (ascending and descending) does not negate the signs before the logit coefficients in the equations
Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: John Wiley & Sons.
Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley & Sons.
Allison, P.D. (1999). Logistic regression using the SAS system: Theory and application. Cary, NC: SAS Institute, Inc.
Ananth, C. V. and Kleinbaum, D. G. (1997). Regression models for ordinal responses: A review of methods and applications. International Journal of Epidemiology 26, p. 1323-1333.
Armstrong, B. B. & Sloan, M. (1989). Ordinal regression models for epidemiological data. American Journal of Epidemiology, 129(1), 191-204.
Bender, R. & Benner, A. (2000). Calculating ordinal regression models in SAS and S-Plus. Biometrical Journal, 42(6), 677-699.
Bender, R. & Grouven, U. (1998). Using binary logistic regression models for ordinal data with non-proportional odds. Journal of Clinical Epidemiology, 51(10), 809-816.
Brant (1990). Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics, 46, 1171-1178.
Clogg, C.C., Shihadeh, E.S. (1994). Statistical models for ordinal variables. Thousand Oaks, CA: Sage.
Greene, William H. (2003). Econometric analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.
Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: John Wiley & Sons.
Long, J. S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage.
Long, J. S. & Freese, J. (2006). Regression models for categorical dependent variables using Stata (2nd ed.). Texas: Stata Press.
McCullagh, P. (1980). Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society Ser. B, 42, 109-142.
McCullagh, P. & Nelder, J. A. (1989). Generalized linear models(2nd ed.). London: Chapman and Hall.
Menard, S. (1995). Applied logistic regression analysis. Thousand Oaks, CA: Sage.
O’Connell, A.A., (2000). Methods for modeling ordinal outcome variables. Measurement and Evaluation in Counseling and Development, 33(3), 170-193.
O’Connell, A. A. (2006). Logistic regression models for ordinal response variables. Thousand Oaks: SAGE.
O’Connell, A.A., Liu, X., Zhao, J., & Goldstein, J. (2006, April). Model Diagnostics for proportional and partial proportional odds models. Paper presented at the 2006 Annual American Educational Research Association (AERA). San Francisco, CA.
Powers D. A., & Xie, Y. (2000). Statistical models for categorical data analysis. San Diego, CA: Academic Press.
Wooldridge, Jeffrey M. (2001). Econometric analysis of cross section and panel data. Cambridge, MA: The MIT Press.