1 / 24

Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting

Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting. John G. Zhang, Ph.D. Harper College jzhang@harpercollege.edu. Topics. Why forecast How to forecast Why ARIMA What is ARIMA How to ARIMA How ARIMA did Discussion. Why Forecast.

zola
Download Presentation

Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting John G. Zhang, Ph.D. Harper College jzhang@harpercollege.edu

  2. Topics • Why forecast • How to forecast • Why ARIMA • What is ARIMA • How to ARIMA • How ARIMA did • Discussion 47th AIR Annual Forum

  3. Why Forecast • Queries and Reports: what was • Dashboard: what is • Forecasts: what will be • Forecast for enrollment: more valuable for resources planning 47th AIR Annual Forum

  4. How to forecast • Naïve forecast: random walk, moving average • Exponential smoothing • Markov chain • Regression • ARIMA • Others • Combining methods 47th AIR Annual Forum

  5. Why ARIMA • Naïve forecast: best guess if no patterns • Exponential Smoothing: usually designed for one-step ahead forecast • Markov chain: see reference • Regression: frequently violates the assumption of uncorrelated errors • ARIMA: worked well, more later • Others: see reference • Combining Methods: non-directional 47th AIR Annual Forum

  6. What is ARIMA • AutoRegressive Integrated Moving Average • Generally, the model is given by 47th AIR Annual Forum

  7. where Xtis a time series value at time t, • 0 is a constant, • B is a backshift or lag operator, • i is a number of lags or spans, •  is an error term at time t, •  and θ are AR and MA parameters, and • p, d, and q are the orders of AR, I, MA 47th AIR Annual Forum

  8. if p = 1, d = 0, q = 1, ARMA(1, 1): (1 - 1B)(Xt – θ0) = (1 - θ1B) t • If p = 1, d = 0, θ1 = 0, AR(1) model: (1 - 1B)(Xt – θ0) = t • If p = 1, 1 = 1, d = 0, θ1= 0, random walk: • (1 - B)(Xt – θ0) = t • If 1 = 0, d = 0, θ1 = 0, constant: (Xt – θ0) = t 47th AIR Annual Forum

  9. How to ARIMA • Box and Jenkins (1976) notation: (p d q)(p d q)s • Four stages: Identification Estimation Validation Forecasting 47th AIR Annual Forum

  10. How to ARIMA • SPSS Trends module: version 12 worked well version 13 and 14: algorithms changed same data, same program, different forecast • SAS ETS module: ARIMA procedure more flexible forecast consistant automation possible thanks to macros 47th AIR Annual Forum

  11. Identification • Series Plot • Autocorrelation plot • Dickey-Fuller test of unit root hypothesis • AR models to compare the log likelihood values for a series and its transformed series 47th AIR Annual Forum

  12. Identification • Degree of differencing • Order of AR • Order of MA • Seasonality if any 47th AIR Annual Forum

  13. Estimation • Q statistics • Goodness-of-fit criteria: variance estimate Akaike information criterion Schwartz Bayesian criterion • Significance of parameters • Residuals analysis • Mean Absolute Percent Error 47th AIR Annual Forum

  14. Data • Time series data • Date variable: year, quarter, month, week, day, hour, minute, second • Enrollment data: FTE, headcount, seatcount • Data points • Nature of the series determines the forecast 47th AIR Annual Forum

  15. Patterns of Data • Trend: steady increase or decrease in the values of a times series • Cycle: long-term patterns of rising and falling data • Seasonality: regular change in the data values that occurs at the same time in a given period 47th AIR Annual Forum

  16. FTE 47th AIR Annual Forum

  17. FTE Pattern • Trendy: FTE increasing from 1998 to 2006, suggesting non-stationary and differencing necessary • Seasonal: higher in the Fall and Spring and lower in the Summer each and every year, implying a seasonal factor present as part of the model building process 47th AIR Annual Forum

  18. ACF Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 1.00000 | |********************| 1 0.64901 | . |************* | 2 0.29267 | . |****** | 3 -.06855 | . *| . | 4 -.42111 | ********| . | 5 -.42944 | *********| . | 6 -.43520 | *********| . | 7 -.40880 | ********| . | 8 -.38067 | ********| . | 9 -.06784 | . *| . | 10 0.25681 | . |***** . | 11 0.55983 | . |*********** | 12 0.85774 | . |***************** | 13 0.55625 | . |*********** | 14 0.24975 | . |***** . | 15 -.06186 | . *| . | 16 -.36715 | . *******| . | 17 -.37708 | . ********| . | 18 -.38454 | . ********| . | 19 -.36197 | . *******| . | 20 -.33780 | . *******| . | 21 -.07144 | . *| . | 22 0.20576 | . |**** . | 23 0.46222 | . |********* . | PACF Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.64901 | . |************* | 2 -0.22210 | ****| . | 3 -0.28449 | ******| . | 4 -0.37073 | *******| . | 5 0.18006 | . |**** | 6 -0.26468 | *****| . | 7 -0.29117 | ******| . | 8 -0.45581 | *********| . | 9 0.72564 | . |*************** | 10 0.06626 | . |* . | 11 0.26005 | . |***** | 12 0.18460 | . |**** | 13 -0.22575 | *****| . | 14 0.14806 | . |***. | 15 0.10247 | . |** . | 16 0.16423 | . |***. | 17 -0.18254 | ****| . | 18 0.15059 | . |***. | 19 -0.04279 | . *| . | 20 0.11045 | . |** . | 21 -0.18268 | ****| . | 22 0.08106 | . |** . | 23 -0.06703 | . *| . | Autocorrelations and Partial Autocorrelations (ACF and PACF) 47th AIR Annual Forum

  19. Q Statistics • Autocorrelation Check of Residuals • To Chi- Pr > • Lag Square DF ChiSq --------------------Autocorrelations-------------------- • 6 385.69 6 <.0001 0.937 0.874 0.808 0.743 0.727 0.711 • 12 777.02 12 <.0001 0.709 0.707 0.752 0.799 0.833 0.866 • 18 1107.12 18 <.0001 0.811 0.755 0.697 0.640 0.624 0.608 • 24 1436.47 24 <.0001 0.605 0.603 0.640 0.679 0.706 0.732 • Q Statistics show autocorrelations among various lags highly statistically significant • Autocorrelations were very high • Further actions needed 47th AIR Annual Forum

  20. FTE Forecast 47th AIR Annual Forum

  21. How ARIMA Did • Accuracy: what matters most • 2-period ahead: 0.74% (FTE) 0.50% (HC) • 6-period ahead: 1.43% (FTE) 1.65% (HC) • 10-period ahead: 1.40% (FTE) 2.52%(HC) • Forecast error bigger into distant future • Eleanor S. Fox (2005) 1.2% (4) 4.1% (8) • NCES (2003) 1.9% (2) 3.6% (6) 47th AIR Annual Forum

  22. Discussion • Theoretically factors includable along with the time series itself like in regression • Unemployment rate • Consumer Price Index (CPI) • High school student population • District population • Tuition • Forecasts used for forecasting? 47th AIR Annual Forum

  23. Discussion • Stationarity and homogeneity • Scarcity and spuriousness • Seasonality and outliers • Raw or cooked data • Data mining and stepwise • Fit and accuracy • Additive or multiplicative (subset/factored) 47th AIR Annual Forum

  24. Discussion • Science and art • Objective and Subjective • Quantitative and qualitative • Over-differencing and over-fitting • Parsimony and uncertainty • Simple or complex 47th AIR Annual Forum

More Related