1 / 47

Forecasting: Something Old, Something New

Forecasting: Something Old, Something New. D. A. Dickey NC State University Previously presented at SAS Global Forum 2018. Daily sales of Item 1 over 4 weeks. Predict next 7 days’ sales. ?. Old: PROC ARIMA ARIMA( p,d,q ) New: PROC ESM (Exponential Smoothing M odel)

mshelley
Download Presentation

Forecasting: Something Old, Something New

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Forecasting: Something Old, Something New D. A. Dickey NC State University Previously presented at SAS Global Forum 2018

  2. Daily sales of Item 1 over 4 weeks. Predict next 7 days’ sales. ?

  3. Old: PROC ARIMA ARIMA(p,d,q) New: PROC ESM (Exponential Smoothing Model) ARIMA(0,1,1) Yt+1 = Yt + et+1 – qet (0<q<1) At time t, prediction of Yt+1 is Lt = Yt- q Lt = “Smoothed level at time t” = Prediction of Y at time t+1 Think of as the prediction error at time t: Yt – Lt-1 Lt becomes Yt- q(Yt -Lt-1) = (1-q)Yt + qLt-1 Exponential smoothing: A method, not a model Exponential smoothing: Lt = wYt + (1-w)Lt-1 Exponential smoothing: A weighted average between what happened and what was predicted last time. EXPONENTIAL

  4. Estimates (1-q from ARIMA and w from ESM) are about the same. procesm data=item1 outest=est out=outesmlead=7; id date interval=day; forecast sales; run; procprintdata=est; procarimadata=item1; identifyvar=sales(1); estimateq=1noconstant ml; forecastlead=7out=outarimaid=date interval=day; run; The ARIMA Procedure Moving Average Factors Factor 1: 1 - 0.541 B**(1) _EST_ _STDERR_ _TVALUE_ _PVALUE_ 0.44694 0.11339 3.94149 .000516554 w = 1-q = 1-0.541 = 0.459

  5. ESM: Forecast=data in historical data ARIMA: Forecasts one step ahead in historical data Forecasts (almost) identical

  6. Daily sales of 50 items, national chain. Need inventory for 2 weeks. Purchase how many of each item? Visual impression: Different means and variances.

  7. Sales of 50 items date item sales 03/16/18 1 86 03/17/18 1 93 03/18/18 1 89 03/19/18 1 99 03/20/18 1 87 03/21/18 1 79 03/22/18 1 89 03/23/18 1 84 03/24/18 1 84 03/25/18 1 71 03/26/18 1 84 03/27/18 1 77 03/28/18 1 69 03/29/18 1 63 03/30/18 1 58 03/31/18 1 77 04/01/18 1 54 04/02/18 1 58 04/03/18 1 64 04/04/18 1 64 04/05/18 1 66 04/06/18 1 71 04/07/18 1 59 04/08/18 1 66 04/09/18 1 68 04/10/18 1 63 04/11/18 1 70 04/12/18 1 66 03/16/18 2 111 03/17/18 2 110 03/18/18 2 108 03/19/18 2 111 03/20/18 2 106 03/21/18 2 110 03/22/18 2 111 DATA STRUCTURE 04/10/18 49 111 04/11/18 49 106 04/12/18 49 104 03/16/18 50 95 03/17/18 50 103 03/18/18 50 99 03/19/18 50 98 03/20/18 50 94 03/21/18 50 95 03/22/18 50 95 03/23/18 50 95 03/24/18 50 93 03/25/18 50 96 03/26/18 50 101 03/27/18 50 97 03/28/18 50 98 03/29/18 50 96 03/30/18 50 103 03/31/18 50 98 04/01/18 50 99 04/02/18 50 104 04/03/18 50 98 04/04/18 50 100 04/05/18 50 98 04/06/18 50 99 04/07/18 50 104 04/08/18 50 98 04/09/18 50 100 04/10/18 50 103 04/11/18 50 112 04/12/18 50 108 More …… ..…. Data

  8. Forecast next 14 days and total sales Tasks Put data and forecasts in “out1.” Accumulate total forecasts for lead=14 days. Put those in outsum dataset “demand.” procesm data=sales lead=14 out=out1 outsum=demand; forecast sales; byitem; iddate interval=day accumulate=total; run; procsgplotdata=out1; seriesX=date Y=sales/group=item; procprintdata=demand; varitem predict upper; run;

  9. Obs item PREDICT UPPER 1 1 932.48 1173.74 2 2 1856.98 1999.36 3 3 1604.64 1751.10 4 4 1132.05 1243.37 5 5 1854.95 1959.20 6 6 1322.08 1363.76 7 7 1337.25 1434.15 8 8 1665.49 1789.10 9 9 1643.92 1822.79 10 10 1488.38 1580.74 11 11 1421.42 1586.60 12 12 1352.65 1517.73 (more lines) 32 32 1534.12 1616.63 33 33 1336.96 1353.97 34 34 1516.26 1545.02 35 35 1886.12 2079.95 36 36 1787.43 1859.99 37 37 1760.80 2157.28 38 38 1405.68 1472.60 39 39 1987.76 2376.81 40 40 1187.69 1218.40 41 41 1320.68 1536.12 42 42 1833.59 1896.48 43 43 1841.61 2016.97 44 44 1467.61 1609.85 45 45 1265.19 1394.57 46 46 1494.54 1562.29 47 47 1805.99 1897.43 48 48 1587.94 1886.77 49 49 1456.22 1476.41 50 50 1494.17 1605.76 Exponential smoothing forecasts.

  10. What about locally trending data? Mortgage rates 1986- Sept. 2016, monthly Rates (left) and differences (right)

  11. Forecast rates and changes with exponential smoothing drate: difference lrate:logdlrate: difference of logs procESM data=mortgage lead=48 out=out1 outest=betas; id date interval=month; forecast rate dratelratedlrate; date rate dratelratedlrate AUG2016 3.55200 -0.005000 1.26751 -.001406668 SEP2016 3.57900 0.027000 1.27508 0.007572607 OCT2016 3.57897 -0.015426 1.27508 -.003017797 NOV2016 3.57897 -0.015426 1.27508 -.003017797 DEC2016 3.57897 -0.015426 1.27508 -.003017797 JAN2017 3.57897 -0.015426 1.27508 -.003017797

  12. procprintnoobsdata=betas; var _name_ _MODEL_ _PARM_ _EST_; _NAME_ _MODEL_ _PARM_ _EST_ rateSIMPLE LEVEL 0.99900 boundary problem! drate SIMPLE LEVEL 0.00386 lrate SIMPLE LEVEL 0.99900  boundary problem! dlrate SIMPLE LEVEL 0.00512

  13. What happened? • = 1- ww near 1 => q near 0 • Yt = Yt-1 + et – 0et-1 is just a random walk! • To capture downward movement, perhaps • random walk with drift b: • Yt= Yt-1+ b + et • or just linear trend + ARMA errors. • ESM models have no intercepts. • Differencing again gives • Yt-Yt-1= Yt-1 – Yt-2+ 0+ et- h et-1 • but “trend weight” g=1-h = 0 

  14. Local level Lt, Local trend Tt • Lt-1 and Yt are estimates of level Lt • Lt=qYt+(1-q)Lt-1 smoothed estimate. • Yt<Lt-1 => Lt<Lt-1 (decrease)  trend? • Tt-1 and level change Lt-Lt-1 are trend estimates. • Smooth: Tt=q(Lt-Lt-1)+(1-q)Tt-1 • “Double exponential smoothing” • Holt (linear) uses Lt=qYt+(1-q)(Lt-1+ Tt-1) • Holt (linear) uses Tt=g(Lt-Lt-1)+(1-g)Tt-1

  15. Theoretical ARIMA equivalent procarima; identifyvar=Y(1,1); estimateq=2noconstant; If (1-gB) or (1-qB) = 1-B, that cancels one and version 2 becomes, e.g. procarima; identifyvar=Y(1); estimateq=1;

  16. Back to mortgage rates ESM forecasts: Double,Linear, and Damped Trend forecasts

  17. Don’t like the forecasts? Check parameter estimates! Obs _NAME_ _MODEL_ _PARM_ _EST_ error 1 rate DOUBLE WEIGHT 0.73837 2 drate DOUBLE WEIGHT 0.00389 3 lrate DOUBLE WEIGHT 0.70168 4 dlrate DOUBLE WEIGHT 0.00540 5 rate LINEAR LEVEL 0.99900 ** hit boundary 6 rate LINEAR TREND 0.00100 ** hit boundary 7 drate LINEAR LEVEL 0.00393 8 drate LINEAR TREND 0.00100 ** hit boundary 9 lrate LINEAR LEVEL 0.99900** hit boundary 10 lrate LINEAR TREND 0.00100 ** hit boundary 11 dlrate LINEAR LEVEL 0.00520 12 dlrate LINEAR TREND 0.00100 ** hit boundary

  18. Damped Trend Exponential Smoothing • Holt (linear) uses Lt=qYt+(1-q)(Lt-1+ Tt-1) • Holt (linear) uses Tt=g(Lt-Lt-1)+(1-g)Tt-1 • Equivalent to ARIMA(0,2,2) • Damped trend has damping coefficient f • DT uses Lt=qYt+(1-q)(Lt-1+ fTt-1) • DT uses Tt=g(Lt-Lt-1)+(1-g)fTt-1 • Equivalent to ARIMA(1,1,2)

  19. General ESM code for nonseasonal data procESMdata=mortgagelead=&L out=outDoutest=betasDoutfor=outF; id date interval=month; forecastrate/model=Double; run; Simple (default, level only, 1 unit root) Linear (Holt’s method, trend, 2 unit roots) Double (trend, 2 unit roots) DampTrend (Damped Trend, 1 unit root)

  20. Damped trend: Forecasts asymptote to constants Damping coefficient small  fast convergence Obs _NAME_ _MODEL_ _PARM_ _EST_ error 13 rate DAMPTREND LEVEL 0.99900 ** hit boundary 14 rate DAMPTREND TREND 0.99900** hit boundary 15 rate DAMPTREND DAMPING 0.31547 16 drate DAMPTREND LEVEL 0.00386 17 drate DAMPTREND TREND 0.00100 ** hit boundary 18 drate DAMPTREND DAMPING 0.00100 ** hit boundary 19 lrate DAMPTREND LEVEL 0.99900 ** hit boundary 20 lrate DAMPTREND TREND 0.99900 ** hit boundary 21 lrate DAMPTREND DAMPING 0.28919 22 dlrate DAMPTREND LEVEL 0.00512 23 dlrate DAMPTREND TREND 0.00100 ** hit boundary 24 dlrate DAMPTREND DAMPING 0.00100 ** hit boundary

  21. ARIMA (0,1,1) with intercept - vs. ESM (model = linear)

  22. ARIMA (0,1,1) - with intercept (drift) Rate exp( ln(rate) )

  23. Try linear trend + ARIMA(1,0,1) errors (suggested by boundary issues) procarimadata=extend plots=(forecast(forecast)); identifyvar=rate crosscor=date stationarity=(Dickey); estimateinput=(date) p=1q=1 ml; forecastlead=360 interval=month id=date; run; Augmented Dickey-Fuller Tests Type Lags Tau Pr < Tau Trend 1 -4.33 0.0032 2 -3.50 0.0411 Maximum Likelihood Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| MU 16.13668 0.68433 23.58 <.0001 MA1,1 -0.43467 0.04909 -8.85 <.0001 AR1,1 0.91011 0.02222 40.96 <.0001 NUM1 -0.0006000 0.00004410 -13.61 <.0001

  24. Trend +ARIMA (1,0,1) - vs. ESM (model = linear)

  25. Phone Lines (per 100 people) Linearmethod NCSU • * Estimates well within (0,1) • Effect of early levels and trends not very influential (weights not near 0) * • Forecast overshoots at change point • Readjusts quickly * • Forecast bands spread very quickly. This is characteristic of ARIMA(p,2,q) with MA roots not near 1.

  26. Phone Lines (per 100 people) Damped trendmethod NCSU • Trend weight hits the boundary • Effect of early levels and trends not very influential • Forecast overshoots at change point • Readjusts quickly * • Forecast bands spread very quickly but not as bad as linear method • Forecasts will not become negative.

  27. Seasonal Exponential Smoothing • Simple case, smooth by season (Month) • Lt= wYt + (1-w)Lt-s • Lt = wYt+ (1-w)Lt-12 • ARIMA equivalent Yt = Yt-12 + et – qet-12 whereq=1-w

  28. General ESM code for seasonal data procESMdata=mortgagelead=&L out=outDoutest=betasDoutfor=outF; id date interval=month; forecastrate/model=Winters; run; Addseasonal|Seasonal (level + seasonal) Winters (trend + multiplicative seasonal) Addwinters (trend + additive seasonal)

  29. procesm data=denversnow out=outsnow lead=48plot=all outest=betas; forecast snow / model =seasonal; iddate interval=month; procprintdata=betas; run;

  30. Obs _NAME_ _MODEL_ _PARM_ _EST_ _TVALUE_ _PVALUE_ 1 snow SEASONAL LEVEL .007586678 2.49908 0.01282 2 snow SEASONAL SEASON .001000000 0.08875 0.92932 Estimate hit the boundary but forecasts seem reasonable

  31. Response to boundary problem, w near 0 (q near 1) Suppose f(t) = f(t-12) and Yt = f(t) + et Yt = Yt-12 + et – qet-12 where q=1 (w = 1-q = 0) Our wwas near 0. Try ARIMA with dummy variables mn1 - mn11 %letmlist = mn1 mn2 mn3 mn4 mn5 mn6 mn7 mn8 mn9 mn10 mn11;procarimadata=reg; identifyvar=snow crosscor=(&mlist) ; estimateinput = (&mlist); forecastlead=48id=date interval=month out=outfor;

  32. Residuals look great Forecasts VERY similar to ESM!

  33. Smoothing for trend plus seasonal data Winters additive and multiplicative models Local linear trend Add local monthly effects with average 0 or Multiply local linear by monthly factors with average 1

  34. Unobserved Components Models (PROC UCM) Idea: Express components as recursions: Yt=Yt-1, Y0=a Y is a horizontal line at a Yt=b+Yt-1, Y0=a , Y1=a+b Y is a line a+bt Yt=Yt-12, Y0=S1, Y1=S2 … Y11=S12 Y is periodic (seasonal) Idea: Make these flexible by adding error terms. (Error variance 0 implies deterministic) Idea: Model = sum of components Note: Recursions are unit root type – like random walk Yt=Yt-1+et

  35. Unobserved Components Models (PROC UCM) Idea: Express as mean component mt, trend component bt Matrix recursions or

  36. Generated example – stochastic vs. deterministic components Y=level + trend + e. Standard deviations: e 3, level 1, trend 0.1 • Y=level + trend + e. • Standard deviations: • e 3, level 0, trend 0

  37. Dow Jones, Nov. 1 2017 – March 6 2018 Feb. 5, 2018 

  38. procucmdata=dow plot=all; id date interval=weekday; model close; level; slope; seasontype=dummy length=5; irregular; ** error term **; run; Final Estimates of the Free Parameters ApproxApprox Component Parameter Estimate Std Error t Value Pr > |t| Irregular Error Variance 0.82940 3.05842 0.27 0.7862 Level Error Variance 69907 11126.3 6.28 <.0001 Slope Error Variance 0.01416 . . . Season Error Variance 0.02507 . . . Slope, seasonal and irregular seem to be deterministic (no variance)

  39. procucmdata=dow plot=all; id date interval=weekday; model close; level; slopevariance=0noest; seasontype=dummy length=5variance=0noest; procucmdata=dow plot=all; id date interval=weekday; model close; level; ( Recall: Level with error et is Yt = Yt-1+et, a random walk!

  40. Significance Analysis of Components (Based on the Final State) Component DF Chi-Square Pr > ChiSq Level 1 382800 <.0001 Slope 1 0.33 0.5633 Season 4 0.17 0.9966 Slope and seasonal can be omitted  DJIA is just a random walk. Outlier Summary Obsdate Break Type Estimate DF Pr > ChiSq 69 05FEB2018 Additive Outlier -933.57192 1 <.0001

  41. procucmdata=employment plots=all; id date interval=month; model employed; level; slope; seasontype=dummy length=12variance=0noest; title3"Dummy approach"; forecastlead=24plot=decompoutfor=for; run;

  42. Smoothed level plot with forecast

  43. Contact Information • Name: David A. Dickey • Company: NC State University • City/State: Raleigh NC • Phone: (919) 515-1925 • Email: dickey@stat.ncsu.edu

  44. OPTIONAL (time permitting) Actual tides at Wilmington NC and NOAA forecasts. Vertical lines mark midnights.

  45. Actual tides minus NOAA forecasts. Can we improve accuracy by predicting these deviations with ESM?? Maybe seasonal additive model?

  46. procesmdata=cut lead=48 out=out1 outest=betas; where hour<"26jan2018.0.0"dt; idhour interval=hour; forecast A_NOAAP / model=addseasonal; NOAA epa= NOAA – actual Improved eia = improved - actual actual Sums of squared errors Variable USS ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ epa10.6752300 eia3.6772151 

More Related