231 likes | 560 Views
The value of forecasts?. Professor of Economics, Yale University, September 1929
E N D
1. Forecasting Health Services with Time Series Methods: Case Study Tim Bruckner
Assistant Professor
Public Health & Planning, Policy, and Design
University of California, Irvine
tim.bruckner@uci.edu
3. The value of forecasts?
Professor of Economics, Yale University, September 1929
“Stock prices have reached what looks like a
permanently high plateau”
4. The value of forecasts?
Professor of Economics, Yale University, September 1929
“Stock prices have reached what looks like a
permanently high plateau”
just before the stock market
crash and Great Depression
5. We must make decisions today
Policymakers must allocate health budgets and set priorities based on, in part, expectations of future need/capacity
Time series methods provide forecasting options that, in the long run, routinely outperform other regression methods
2-4 percent off of actual claims 2-4 percent off of actual claims
6. Learning Objectives
Be able to clearly delineate forecasting goals given the context of the situation
Describe three general forms of autocorrelation in a time series
Understand the univariate ARIMA forecasting strategy and its applications
7. Case Study: Children’s Mental Health California’s publicly funded children’s Medicaid Early Periodic Screening, Diagnosis, and Treatment (EPSDT) program
Services for children 5 to 21 years
Serves 130,000 children per month
Annual costs > $1 billion
CA Dept of Mental Health wanted to improve their forecasting accuracy
8. EPSDT: What is the context?
GOALS:
Point forecast of annual total costs (vs. interval, monthly)
Error less than 4%
12 to 24 month lead time (2 years ahead)
Flexibility– incorporate “what if” policy changes into forecasts
Use their expertise rather than outsource
Transferable process in the presence of staff turnover
DATA:
Of good quality?
Consistently measured?
Cost Data not immediately available—6 month delay
What data are usable? Recent data, post-expansion, most relevant
2-4 percent off of actual claims 2-4 percent off of actual claims
9. Their Original Forecast
Stepwise Auto-Regression with Linear Trend
Cost2008 = (Wt2007 x Cost2007) + (Wt2006 x Cost2006)
+ (Wt2005 x Cost2005) + (Wt2004 x Cost2004)
+ (Wt x TimeTrend) + Error
Method weighted most recent years more heavily than past years
Good accuracy: 2 - 4 % error
But, 4% of $1 Billion = $40 Million!
2-4 percent off of actual claims 2-4 percent off of actual claims
10. Quality of Forecasts based on ARIMA
Forecasts are extrapolations of historical data
A well-behaved history tends to lead to more accurate forecasts
ex: distance between Moon and Earth on Jan 1, 2020
2-4 percent off of actual claims 2-4 percent off of actual claims
11. Quality of Forecasts
But . . . an erratic history tends to
lead to less accurate forecasts
“Forecasting is like driving a car
blindfolded with help from someone
looking out of the rear window” –anon.
In health policy, we deal with stochastic (not deterministic) series, with varying levels of predictability
2-4 percent off of actual claims 2-4 percent off of actual claims
12. EPSDT Costs, FY 1994 to 2006
14. AR - I - MA Autoregressive: tendency for high or low values to exhibit “memory” in subsequent periods
Shock stays in system indefinitely, but diminishes exponentially (ex: temperature set by thermostat)
Integrated: time series has non-constant mean
Differencing t-1 from t as a strategy (ex: odometer)
Moving Average: shock persists for q observations and then is gone; “echo” in subsequent periods (ex: aftershock)
15. Box-Jenkins (ARIMA) models expected value of costs is not its mean, due to patterns (e.g., trend, seasonality)
earlier values of the dependent variable itself are used to remove patterns, so that expected value of residuals = 0
Use this best-fitting ARIMA model to perform “out of sample,” step-ahead forecasts
The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.
16. Forecast FY 04-05 from FY01-FY04 Cannot assess accuracy for FY ’07-08 unless we wait for future observations to become available
Instead, check the forecasting ability of model using data already at hand—FY 04-05 was last complete yr
Univariate method: depends only on present and past values of the single series being forecasted
The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.
17. Out of Sample Forecast: 04-05 Use monthly values from July 2001 to June 2004 (n=36)
Identify autocorrelation and specify the appropriate error term
1-step ahead minimum mean squared error forecast (MMSE)
The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.
18.
?12 indicates that the variable has been differenced at lag 12 (i.e., value at month t subtracted from value t+12).
Yt is EPSDT costs during month t.
B3 “autoregressive” parameter, implies that a proportion (estimated by phi that is always less than 1??) of the estimated value of y at month t is “remembered” into t+3.
at is the error term at month t.
20. Other ARIMA approaches “What if” scenarios: Intervention approach
Quantify past shocks to predict impact of future changes
Example: How did change in age qualification affect EPSDT?
Multivariate approach
Incorporate time series independent variables on the “right side”
Example for EPSDT: Caseload, Services/Client, Unit Cost
If you have lead data on these, in advance of total cost data, this is useful
The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.
21. Practical Limitations
Data availability often dictate approach
Requires some practice
ARIMA forecasts perform best for short lead times; long-term is questionable for any method
Considerable heterogeneity of economic conditions in CA
Omitting such estimates from our equations could have induced a type I error only if conceptions among blacks, Hispanics, and non-Hispanic whites moved above or below their expected values nine to fifteen months before monthly employment moved in the opposite direction around its expected value. We do not believe that this possibility is a compelling rival to our theory. Considerable heterogeneity of economic conditions in CA
Omitting such estimates from our equations could have induced a type I error only if conceptions among blacks, Hispanics, and non-Hispanic whites moved above or below their expected values nine to fifteen months before monthly employment moved in the opposite direction around its expected value. We do not believe that this possibility is a compelling rival to our theory.
22. Summary Time series methods outperform other forecasting approaches
A clear understanding of planning goals and data attributes benefit any forecast
Autoregressive, Integrated, and Moving Average parameters reflect three general forms of autocorrelation
ARIMA forecasts perform best in stable systems, but can flexibly handle perturbations The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.The deviation of a particular point from the regression line (its predicted value) is called the residual value.
Analogous to regression analysis—arrive at best-fitting model (based on history of dependent variable) and then examine residuals—because I’m most interested in independent variable explanatory power.
Observed T-S is the realization of some underlying stochastic process, a realization that is used to build a model of the process which generated the series.
Box-Jenkins makes no assumptions on the shape of the dependent variable— we have no a priori expectation and impose no filter. Then we build empirically the best-fitting ARIMA model based on empirically-derived characteristics of the series.
Removing autocorrelation from the dependent variable before testing the effect of the independent variable yields the added benefit of avoiding spurious associations induced by shared trends and cycles. The estimated coefficients are net of shared autocorrelation.
A few assumptions:
1. Homogeneous sense stationarity—process has to be level EYt=Theta0
(no drift or trends)—this can be accomplished by differencing (backward shift operator)
Dth order differencing operator is applied to the series (p.46)
2. Stationary variance: single constant variance throughout its course
Usually fine after differencing, or log-transformation of data (and first differencing of log-transformed data
Leads to stationary variance.
Because ARIMA models must be identified from data to be modeled, t-s of over 50 observations are recommended/req’d.
23. Computer Resources
STATA: tsset; arima; predict; corrgram; ac; pac
SAS: proc ARIMA
SCA: http://www.scausa.com/ Considerable heterogeneity of economic conditions in CA
Omitting such estimates from our equations could have induced a type I error only if conceptions among blacks, Hispanics, and non-Hispanic whites moved above or below their expected values nine to fifteen months before monthly employment moved in the opposite direction around its expected value. We do not believe that this possibility is a compelling rival to our theory. Considerable heterogeneity of economic conditions in CA
Omitting such estimates from our equations could have induced a type I error only if conceptions among blacks, Hispanics, and non-Hispanic whites moved above or below their expected values nine to fifteen months before monthly employment moved in the opposite direction around its expected value. We do not believe that this possibility is a compelling rival to our theory.