1 / 56

Time Series

Math 419/592 Winter 2009 Prof. Andrew Ross Eastern Michigan University. Time Series. Overview of Stochastic Models. Take Math 560 (Optimization) this fall! Sign up soon or it will disappear. But first, a word from our sponsor. Outline. Look at the data! Common Models

Rita
Download Presentation

Time Series

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Math 419/592 Winter 2009 Prof. Andrew Ross Eastern Michigan University Time Series

  2. Overview of Stochastic Models

  3. Take Math 560 (Optimization) this fall! Sign up soon or it will disappear But first, a word from our sponsor

  4. Outline • Look at the data! • Common Models • Multivariate Data • Cycles/Seasonality • Filters

  5. or else! Look at the data!

  6. Atmospheric CO2 Years: 1958 to now; vertical scale 300 to 400ish

  7. Ancient sunspot data

  8. Our Basic Procedure • Look at the data • Quantify any pattern you see • Remove the pattern • Look at the residuals • Repeat at step 2 until no patterns left

  9. Our basic procedure, version 2.0 • Look at the data • Suck the life out of it • Spend hours poring over the noise • What should noise look like?

  10. One of these things is not like the others

  11. Stationarity • The upper-right-corner plot is Stationary. • Mean doesn't change in time • no Trend • no Seasons (known frequency) • no Cycles (unknown frequency) • Variance doesn't change in time • Correlations don't change in time • Up to here, weakly stationary • Joint Distributions don't change in time • That makes it strongly stationary

  12. Our Basic Notation • Time is “t”, not “n” • even though it's discrete • State (value) is Y, not X • to avoid confusion with x-axis, which is time. • Value at time t is Yt, not Y(t) • because time is discrete • Of course, other books do other things.

  13. Detrending: deterministic trend • Fit a plain linear regression, then subtract it out: • Fit Yt = m*t + b, • New data is Zt = Yt – m*t – b • Or use quadratic fit, exponential fit, etc.

  14. Detrending: stochastic trend • Differencing • For linear trend, new data is Zt = Yt – Yt-1 • To remove quadratic trend, do it again: • Wt = Zt – Zt-1=Yt – 2Yt-1 + Yt-2 • Like taking derivatives • What’s the equivalent if you think the trend is exponential, not linear? • Hard to decide: regression or differencing?

  15. Removing Cycles/Seasons • Will get to it later. • For the next few slides, assume no cycles/seasons.

  16. A brief big-picture moment • How do you compare two quantities? • Multiply them! • If they’re both positive, you’ll get a big, positive answer • If they’re both big and negative… • If one is positive and one is negative… • If one is big&positive and the other is small&positive…

  17. Where have we seen this? • Dot product of two vectors • Proportional to the cosine of the angle between them (do they point in the same direction?) • Inner product of two functions • Integral from a to b of f(x)*g(x) dx • Covariance of two data sets x_i, y_i • Sum_i (x_i * y_i)

  18. Autocorrelation Function • How correlated is the series with itself at various lag values? • E.g. If you plot Yt+1 versus Yt and find the correlation, that's the correl. at lag 1 • ACF lets you calculate all these correls. without plotting at each lag value. • ACF is a basic building block of time series analysis.

  19. Fake data on bus IATs

  20. Properties of ACF • At lag 0, ACF=1 • Symmetric around lag 0 • Approx. confidence-interval bars around ACF=0 • To help you decide when ACF drops to near-0 • Less reliable at higher lags • Often assume ACF dies off fast enough so its absolute sum is finite. • If not, called “long-term memory”; e.g. • River flow data over many decades • Traffic on computer networks

  21. How to calculate ACF • R, Splus, SAS, SPSS, Matlab, Scilab will do it for you • Excel: download PopTools (free!) • http://www.cse.csiro.au/poptools/ • Excel, etc: do it yourself. • First find avg. and std.dev. of data • Next, find AutoCoVariance Function (ACVF) • Then, divide by variance of data to get ACF

  22. ACVF at lag h • Y-bar is mean of whole data set • Not just mean of N-h data points • Left side: old way, can produce correl>1 • Right side: new way • Difference is “End Effects” • Pg 30 of Peña, Tiao, Tsay • (if it makes a difference, you're up to no good?)

  23. Common Models • White Noise • AR • MA • ARMA • ARIMA • SARIMA • ARMAX • Kalman Filter • Exponential Smoothing, trend, seasons

  24. White Noise • Sequence of I.I.D. Variables et • mean=zero, Finite std.dev., often unknown • Often, but not always, Gaussian

  25. AR: AutoRegressive • Order 1: Yt=a*Yt-1 + et • E.g. New = (90% of old) + random fluctuation • Order 2: Yt=a1*Yt-1 +a2*Yt-2+ et • Order p denoted AR(p) • p=1,2 common; >2 rare • AR(p) like p'th order ODE • AR(1) not stationary if |a|>=1 • E[Yt] = 0, can generalize

  26. Things to do with AR • Find appropriate order • Estimate coefficients • via Yule-Walker eqn. • Estimate std.dev. of white noise • If estimated |a|>0.98, try differencing.

  27. MA: Moving Average • Order 1: • Yt = b0et +b1et-1 • Order q: MA(q) • In real data, much less common than AR • But still important in theory of filters • Stationary regardless of b values • E[Yt] = 0, can generalize

  28. ACF of an MA process • Drops to zero after lag=q • That's a good way to determine what q should be!

  29. ACF of an AR process? • Never completely dies off, not useful for finding order p. • AR(1) has exponential decay in ACF • Instead, use Partial ACF=PACF, which dies after lag=p • PACF of MA never dies.

  30. ARMA • ARMA(p,q) combines AR and MA • Often p,q <= 1 or 2

  31. ARIMA • AR-Integrated-MA • ARIMA(p,d,q) • d=order of differencing before applying ARMA(p,q) • For nonstationary data w/stochastic trend

  32. SARIMA, ARMAX • Seasonal ARIMA(p,d,q)-and-(P,D,Q)S • Often S= • 12 (monthly) or • 4 (quarterly) or • 52 (weekly) • Or, S=7 for daily data inside a week • ARMAX=ARMA with outside explanatory variables (halfway to multivariate time series)

  33. State Space Model, Kalman Filter • Underlying process that we don't see • We get noisy observations of it • Like a Hidden Markov Model (HMM), but state is continuous rather than discrete. • AR/MA, etc. can be written in this form too. • State evolution (vector): St = F * St-1 + ht • Observations (scalar): Yt = H * St + et

  34. ARCH, GARCH(p,q) • (Generalized) AutoRegressive Conditional Heteroskedastic (heteroscedastic?) • Like ARMA but variance changes randomly in time too. • Used for many financial models

  35. Exponential Smoothing • More a method than a model.

  36. Exponential Smoothing = EWMA • Very common in practice • Forecasting w/o much modeling of the process. • At = forecast of series at time t • Pick some parameter a between 0 and 1 • At = a Yt + (1-a)At-1 • or At = At-1 + a*(error in period t) • Why call it “Exponential”? • Weight on Yt at lag k is (1-a)k

  37. How to determine the parameter • Train the model: try various values of a • Pick the one that gives the lowest sum of absolute forecast errors • The larger a is, the more weight given to recent observations • Common values are 0.10, 0.30, 0.50 • If best a is over 0.50, there's probably some trend or seasonality present

  38. Holt-Winters • Exponential smoothing: no trend or seasonality • Excel/Analysis Toolpak can do it if you tell it a • Holt's method: accounts for trend. • Also known as double-exponential smoothing • Holt-Winters: accounts for trend & seasons • Also known as triple-exponential smoothing

  39. Multivariate • Along with ACF, use Cross-Correlation • Cross-Correl is not 1 at lag=0 • Cross-Correl is not symmetric around lag=0 • Leading Indicator: one series' behavior helps predict another after a little lag • Leading means “coming before”, not “better than others” • Can also do cross-spectrum, aka coherence

  40. Cycles/Seasonality • Suppose a yearly cycle • Sample quarterly: 3-med, 6-hi, 9-med, 12-lo • Sample every 6 months: 3-med, 9-med • Or 6-hi, 12-lo • To see a cycle, must sample at twice its freq. • Demo spreadsheet • This is the Nyquist limit • Compact Disc: samples at 44.1 kHz, top of human hearing is 20 kHz

  41. The basic problem • We have data, want to find • Cycle length (e.g. Business cycles), or • Strength of seasonal components • Idea: use sine waves as explanatory variables • If a sine wave at a certain frequency explains things well, then there's a lot of strength. • Could be our cycle's frequency • Or strength of known seasonal component • Explains=correlates

  42. Correlate with Sine Waves • Ordinary covar: • At freq. Omega, (means are zero) • Problem: what if that sine is out of phase with our cycle?

  43. Solution • Also correlate with a cosine • 90 degrees out of phase with sine • Why not also with a 180-out-of-phase? • Because if that had a strong correl, our original sine would have a strong correl of opposite sign. • Sines & Cosines, Oh My—combine using complex variables!

  44. The Discrete Fourier Transform • Often a scaling factor like 1/T, 1/sqrt(T), 1/2pi, etc. out front. • Some people use +i instead of -i • Often look only at the frequencies • k=0,...,T-1

  45. Hmm, a sum of products • That reminds me of matrix multiplication. • Define a matrix F whose j,k entry is exp(-i*j*k*2pi/T) • Then • Matrix multiplication takes T^2 operations • This matrix has a special structure, can do it in about T log T operations • That's the FFT=Fast Fourier Transform • Easiest if T is a power of 2

  46. So now we have complex values... • Take magnitude & argument of each DFT result • Plot squared magnitude vs. frequency • This is the “Periodogram” • Large value = that frequency is very strong • Often plotted on semilog-y scale, “decibels” • Example spreadsheet

  47. Spreadsheet Experiments • First, play with amplitudes: • (1,0) then (0,1) then (1,.5) then (1,.7) • Next, play with frequency1: • 2*pi/8 then 2*pi/4 • 2*pi/6, 2*pi/7, 2*pi/9, 2*pi/10 • 2*pi/100, 2*pi/1000 • Summarize your results for yourself. Write it down! • Reset to 2*pi/8 then play with phase2: • 0, 1, 2, 3, 4, 5, 6... • Now add some noise to Yt

  48. Interpretations • Value at k=0 is mean of data series • Called “DC” component • Area under periodogram is proportional to Var(data series) • Height at each point=how much of variance is explained by that frequency • Plotting argument vs. frequency shows phase • Often need to smooth with moving avg.

  49. What is FT of White Noise? • Try it! • Why is it called white noise? • Pink noise, etc. (look up in Wikipedia)

More Related