Data Deluge in Times Series Analysis ("I need 1,000+ forecasts by end of day!") - PowerPoint PPT Presentation

data deluge in times series analysis i need 1 000 forecasts by end of day n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Deluge in Times Series Analysis ("I need 1,000+ forecasts by end of day!") PowerPoint Presentation
Download Presentation
Data Deluge in Times Series Analysis ("I need 1,000+ forecasts by end of day!")

play fullscreen
1 / 66
Data Deluge in Times Series Analysis ("I need 1,000+ forecasts by end of day!")
462 Views
Download Presentation
lei
Download Presentation

Data Deluge in Times Series Analysis ("I need 1,000+ forecasts by end of day!")

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Data Deluge in Times Series Analysis("I need 1,000+ forecasts by end of day!")

  2. Overview • Mass Forecasting vs. Data Mining • Good ol’ fashioned forecasting • ARIMA(X), ESMs, UCMs • Data deluge (real world challenges) • Building a good mass forecasting system • 1st attempts • “The kitchen sink” • Intelligent automation

  3. Predictive Modeling (Data Mining) vs. Forecasting Predictive Modeling

  4. Predictive Modeling (Data Mining) vs. Forecasting Forecasting

  5. Predictive Modeling (Data Mining) vs. Forecasting Accuracy Profit AROC (c-sat) KS RMSE, SBC, AIC SE, SP GAIN LIFT Etc….

  6. Predictive Modeling (Data Mining) vs. Forecasting TIME  RMSE MAE MPE MAPE APE WAPE SBC, BIC R2 Random Walk R2 …and on and on and on…

  7. Overview • Mass Forecasting vs. Data Mining • Good ol’ fashioned forecasting • ARIMA(X), ESMs, UCMs • Data deluge (real world challenges) • Building a good mass forecasting system • 1st attempts • “The kitchen sink” • Intelligent automation

  8. The Universal Univariate Time Series Model TREND ERROR (Irregular) SEASONAL TRANSFORMATION

  9. Additive Decomposition of the Airline Data T: Linear Trend S: Seasonal Average I: Irregular Component

  10. Some SAS/ETS Procedures • ARIMA • AutoRegressive Integrated Moving Average models • Dynamic regression models (transfer function models) • AUTOREG • Simple regression models with autoregressive errors • ARCH and GARCH models (not covered) • FORECAST – ESM and autoregression models • UCM – Unobserved components models • SPECTRA – Spectral analysis • MODEL – Nonlinear modeling

  11. Simple Exponential Smoothing Weights Y3 Y5 Y6 Y7 Y8 Y1 Y3 Y4 Y5 Y6 Y7 Y8 Y4 Y2 Weights applied to past values to predict Y9 The larger the parameter, the more the most recent values are emphasized. ...

  12. ESM Models ESM Parameters Simple  Double  Linear (Holt) ,  Damped-Trend , ,  Seasonal ,  Additive Winters , ,  Multiplicative Winters , , 

  13. Unobserved Components Models (UCMs) • Also known as structural time series models • Decompose time series into four components: • trend • season • cycle • irregular • General form: • Yt = Trend + Season + Cycle + Regressors

  14. UCMs • Each component captures some important feature of the series dynamics. • Components in the model have their own models. • Each component has its own source of error. • The coefficients for trend, season, and cycle are dynamic. • The coefficients are testable. • Each component has its own forecasts.

  15. UCM Procedure PROC UCM DATA=SAS-data-set ; ID variable INTERVAL=interval ; MODEL variable <=variables> ; IRREGULAR <options> ; LEVEL <options> ; SLOPE <options> ; SEASONLENGTH=n TYPE=DUMMY|TRIG <options> ; CYCLE <options> ; ESTIMATE OUTEST=SAS-data-set <options> ; FORECAST OUTFOR=SAS-data-set LEAD=n <options> ; RUN;

  16. Box-Jenkins ARIMAX Models • ARIMAX: AutoRegressive Integrated Moving Average with eXogenous variables • AR: Autoregressive  Time series is a function of its own past. • MA: Moving Average  Time series is a function of past shocks (deviations, innovations, errors, and so on). • I: Integrated  Differencing provides stochastic trend and seasonal components, so forecasting requires integration (undifferencing). • X: Exogenous  Time series is influenced by external factors. (These input variables can actually be endogenous or exogenous.)

  17. Autoregressive Moving Average Models A time series that is a linear function of p past values plus a linear combination of q past errors is called anautoregressive moving average process of order (p,q), denoted ARMA(p,q).

  18. Autocorrelation Plots for an AR(2) Time Series

  19. Additive Decomposition of the Airline Data T: Linear Trend S: Seasonal Average I: Irregular Component

  20. Diagnosing Trend Year

  21. Y t Y t Deterministic trends: Add function of time Linear Trend Quadratic Trend

  22. Stochastic Trends: Use differencing Random walk with drift 1st Differenced RWD Take a simple difference

  23. Seasonality? • SYt=Yt Yt-S , called a difference of order S. • Add seasonal dummy variables

  24. ARIMA Procedure PROC ARIMA DATA=SAS-data-set ; BY variables ; IDENTIFY VAR=variable CROSS=(variables) NLAGS=n <options> ; ESTIMATE P=n Q=n INPUT=(variables) METHOD=CLS|ML|ULS <options> ; FORECAST OUT=SAS-data-set ID=variable INTERVAL=interval LEAD=n <options> ; RUN; QUIT;

  25. Overview • Mass Forecasting vs. Data Mining • Good ol’ fashioned forecasting • ARIMA(X), ESMs, UCMs • Data deluge (real world challenges) • Building a good mass forecasting system • 1st attempts • “The kitchen sink” • Intelligent automation

  26. Large-Scale Forecasting • Modern businesses require efficient, reliable forecasts for many series. These forecasts usually need to be updated on a regular basis. • There are too many series to manually implement the textbook approach for each one. • The series might be hierarchically arranged and require reconciliation of forecasts at different levels.

  27. Forecasting a Large Number of Time Series • Compared to the single time series forecasting problem, when there are many time series to be forecast, the following conditions might occur: • There are not enough skilled analysts to provide forecasts for each series using conventional techniques. • Frequent forecast updates are usually required. • Time-stamped data must be converted to time-series data and managed automatically. • Exogenous variables or calendar events might influence the time series and must be included in automatic model selection.

  28. Large-Scale Forecasting Scenario 80% can be forecast automatically. 10% requires extra effort. 10% cannot be forecast accurately. Time Series Data

  29. Overview • Mass Forecasting vs. Data Mining • Good ol’ fashioned forecasting • ARIMA(X), ESMs, UCMs • Data deluge (real world challenges) • Building a good mass forecasting system • 1st attempts • “The kitchen sink” • Intelligent automation

  30. A Good Mass Forecasting System • Requirements • Prepare Time Series Data • Fit many models • Allow for ‘sophisticated’ user defined models • Provide a variety of fit measures • Allow for hold-out assessment • Automatically pick ‘best’ models • Auto-diagnose data • Automatically incorporate events and inputs • Accommodate and reconcile hierarchies

  31. Data Preparation:Equally Spaced Time Series Equally spaced time series Equally spaced time serieswith missing values Unequally spaced time series

  32. Accumulate, Aggregate & Reconcile Accumulate Data Statistical Forecast Reconciled Forecast Daily Monthly Aggregate Data Reconcile Forecasts

  33. TIMESERIES Procedure PROC TIMESERIES DATA=SAS-data-set OUT=SAS-data-set OUTDECOMP=SAS-data-set OUTSEASON=SAS-data-set OUTTREND=SAS-data-set SEASONALITY=n PRINT=(<options>) ; BY variables ; VAR variables ; DECOMP <TC><SC><IC><components> / MODE=ADD|MULT|mode ; ID variable INTERVAL=interval ; RUN;

  34. FORECAST Procedure PROC FORECAST DATA=SAS-data-set OUT=SAS-data-set OUTEST=SAS-data-set TREND=1|2|3 METHOD=STEPAR|method-name AR=n SLENTRY=value SLSTAY=value INTERVAL=interval LEAD=n <options>; BY variables ; ID variables ; VAR variables ; RUN;

  35. Time Series Forecasting System

  36. Summary of Data Used for Forecast Model Building Fit Sample Holdout Sample • Used to estimate model parameters for accuracy evaluation • Used to forecast values in holdout sample • Used to evaluate model accuracy • Simulates retrospective study Full = Fit + Holdout data is used to fit deployment model

  37. Overview • Mass Forecasting vs. Data Mining • Good ol’ fashioned forecasting • ARIMA(X), ESMs, UCMs • Data deluge (real world challenges) • Building a good mass forecasting system • 1st attempts • “The kitchen sink” • Intelligent automation

  38. A Bigger List

  39. Box-Jenkins Model Components Theoretical ARIMAX Model

  40. The backshift operator Bk shifts a time series by k time units. Shift 1 time unit Shift 2 time units Shift k time units Backshift operator notation is a convenient way to write ARMA models. The Backshift Operator

  41. Additive Decomposition of the Airline Data T: Linear Trend S: Seasonal Average I: Irregular Component

  42. Model: Null Hypothesis: Alternative Hypothesis: The Dickey-Fuller Single Mean Test

  43. The Dickey-Fuller Test in PROC ARIMA Augmented Dickey-Fuller Unit Root Tests Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F Zero Mean 0 -0.0214 0.6739 -0.12 0.6395 1 -0.0669 0.6636 -0.41 0.5309 2 -0.0265 0.6726 -0.22 0.6026 3 -0.0316 0.6713 -0.31 0.5682 4 -0.0152 0.6749 -0.18 0.6174 5 0.0005 0.6783 0.01 0.6803 Single Mean 0 -25.0564 0.0012 -4.01 0.0027 8.03 0.0010 1 -41.5691 0.0004 -4.97 0.0002 12.43 0.0010 2 -34.8515 0.0004 -3.66 0.0075 6.73 0.0073 3 -50.5816 0.0004 -3.75 0.0059 7.09 0.0010 4 -53.8412 0.0004 -3.20 0.0260 5.13 0.0428 5 -53.2356 0.0004 -2.71 0.0803 3.67 0.1669 Trend 0 -24.9941 0.0110 -3.95 0.0167 7.89 0.0167 1 -40.8845 <.0001 -4.90 0.0012 12.60 0.0010 2 -34.4841 0.0005 -3.66 0.0350 6.94 0.0452 3 -48.3846 <.0001 -3.81 0.0247 7.83 0.0247 4 -51.7982 <.0001 -3.31 0.0767 5.91 0.0868 5 -54.5918 <.0001 -2.83 0.1951 4.18 0.3694 continued...

  44. ARMA Order Determining Methods • Extended Sample Autocorrelation Function (ESACF) • Minimum Information Criterion (MINIC) • Smallest Canonical Correlation (SCAN)

  45. Event Examples • Retail promotions • Advertising campaigns • Negative articles in major publications • Natural or man-made disasters • Mergers and acquisitions • Government legislated policy changes • Organizational personnel and/or policy changes • Christmas • Strikes • Scandal • Injury, illness, or death of a key player (such as a CEO, CFO, or chief scientist)

  46. Effects of 9/11 on Airline Passengers

  47. Effects of Katrina/Rita on Oil Production

  48. Primary Event Variables Point/Pulse Step Ramp tevent

  49. Events and Outliers • If an event has not been formally specified, PROC ARIMA can identify events as outliers. • Three types of outliers are included in the search: ADDITIVE outlier (AO), level SHIFT (LS), and TEMPorary change (TC). (ADDITIVE, SHIFT, and TEMP are the primary keywords, and AO, LS, and TC are accepted variants.) • PROCARIMA <options> ; • IDENTIFY VAR=variable <options> ; • ESTIMATE <options> ; • OUTLIER TYPE=(AO|LS|TC) <options> ; • FORECAST OUT=SAS-data-set <options> ; • RUN;

  50. Abrupt, Temporary Effect