module 3 garch models
Skip this Video
Download Presentation
Module 3 GARCH Models

Loading in 2 Seconds...

play fullscreen
1 / 66

Module 3 GARCH Models - PowerPoint PPT Presentation

  • Uploaded on

Module 3 GARCH Models. References The classics: • Engle, R.F. (1982), Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of U.K, Econometrica. • Bollerslev, T.P. (1986), Generalized Autoregresive Conditional Heteroscedasticity, Journal of Econometrics.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Module 3 GARCH Models' - trella

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


The classics:

• Engle, R.F. (1982), Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of U.K, Econometrica.

• Bollerslev, T.P. (1986), Generalized Autoregresive Conditional Heteroscedasticity, Journal of Econometrics.


• Bollerslev T., Engle R. F. and D. B. Nelson (1994), ARCH Models, Handbook of Econometrics Vol. 4.

• Engle, R. F. (2001), GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics, Journal of Economic Perspectives.


• Until the early 1980s econometrics had focused almost solely on modeling the means of series -i.e., their actual values.

yt = Et(yt |x) + εt , εt῀ D(0,σ2)

For an AR(1) process:

Et-1 (yt|x) = Et-1 (yt) = α + β yt-1


E(yt) = α/(1-β)and Var(yt) = σ2/(1-β2)

The conditional first moment is time varying, though the unconditional moment is not!

Key distinction: Conditional vs. Unconditional moments.

• Similar idea for the variance

Unconditional variance: Var(yt ) = E[(yt –E[yt])2] = σ2/(1-β2) Conditional variance: Vart-1 (yt ) = Et-1[(yt –Et-1[yt])2] = Et-1[εt2]


Vart-1 (yt ) is the true measure of uncertainty at time t-1.

Conditional variance




Stylized Facts of Asset Returns

i) Thick tails - Mandelbrot (1963):leptokurtic (thicker than Normal)

ii) Volatility clustering - Mandelbrot (1963): “large changes tend to be followed by large changes of either sign.”

iii) Leverage Effects – Black (1976), Christie (1982): Tendency for changes in stock prices to be negatively correlated with changes in volatility.

iv)Non-trading Effects, Weekend Effects – Fama (1965), French and Roll (1986) : When a market is closed information accumulates at a different rate to when it is open –for example, the weekend effect, where stock price volatility on Monday is not three times the volatility on Friday.


v)Expected events – Cornell (1978), Patell and Wolfson (1979), etc: Volatility is high at regular times such as news announcements or other expected events, or even at certain times of day –for example, less volatile in the early afternoon.

vi) Volatility and serial correlation – LeBaron (1992): Inverse relationship between the two.

vii) Co-movements in volatility – Ramchand and Susmel (1998): Volatility is positively correlated across markets/assets.


ARCH Model - Engle(1982)

Auto-Regressive Conditional Heteroskedasticity

• This is an AR(q) model for squared innovations. This model cleverly estimates the unobservable (latent) variance.

•Note: Since we are dealing with a variance


•Even though the errors may be serially uncorrelated they are not independent: there will be volatility clustering and fat tails.

• Define standardized errors:

They have conditional mean zero and a time invariant conditional variance equal to 1. That is, zt ~ D(0,1).

• If, as assumed above, zt is assumed to be time invariant, with a finite fourth moment (use Jensen’s inequality):

If we assume a normal distribution, the 4th moment for an ARCH(1):

more convenient but less intuitive presentation of the arch 1 model
More convenient, but less intuitive, presentation of the ARCH(1) model:

where υt is iid with mean 0, and Var[υt]=1.

Since υt is iid, the:

It turns out that σt2 is a very persistent process. Such a process can be captured with an ARCH(q), where q is large. This is not efficient.


GARCH – Bollerslev (1986)

In practice q is often large. A more parsimonious representation is the Generalized ARCH model or GARCH(q,p):

which is an ARMA(max(p,q),p) model for the squared innovations.


This is covariance stationary if all the roots of

lie outside the unit circle. For the GARCH(1,1) this amounts to

• Bollerslev (1986) showed that if 3α12 + 2α1β1 + β12 < 1, the second and 4th moments of εt exist:


Forecasting and Persistence

  • Consider the forecast in a GARCH(1,1) model

Taking expectation at time t

By repeated substitutions:

As j→∞, the forecast reverts to the unconditional variance: ω/(1-α1-β1).

• When α1+β1=1, today’s volatility affect future forecasts forever:


Nelson’s (1991) EGARCH model

Nelson, D.B. (1991), "Conditional Heteroskedasticity in Asset Returns: A New Approach," Econometrica.


Glosten, L.R., R. Jagannathan and D. Runkle (1993), "Relationship between the Expected Value and the Volatility of the Nominal Excess Return on Stocks," Journal of Finance.

where It-i=1 if εt-i<0; 0 otherwise.

• Both models capture sign (asymmetric) effects in volatility:Negative news increase the conditional volatility (leverage effect).


Non-linear ARCH model NARCH

Higgins and Bera (1992) and Hentschel (1995)

These models apply the Box-Cox transformation to the conditional variance.

Special case: γ=2 (standard GARCH model).

• The variance depends on both the size and the sign of the variance which helps to capture leverage type (asymmetric) effects.


Threshold ARCH (TARCH)

Rabemananjara, R. and J.M. Zakoian (1993), “Threshold ARCH Models and Asymmetries in Volatilities,”Journal of Applied Econometrics.

Large events to have an effect but no effect from small events

There are two variances:

Many other versions are possible by adding minor asymmetries or non-linearities in a variety of ways.


Switching ARCH (SWARCH)Hamilton, J. D. and R. Susmel (1994), "Autoregressive Conditional Heteroskedasticity and Changes in Regime," Journal of Econometrics.

  • • Intuition:
  • - Hamilton (1989) models time series with changes of regime.
  • Simplest case: 2-state process.
  • - Hamilton assumes the existence of an unobserved variable, st, that can take two values: one or two (or zero or one).
  • - Hamilton postulates a Markov transition matrix, P, for the evolution of the unobserved variable:
  • p(st =1 | st-1 =1) = p
  • p(st =2 | st-1 =1) = (1-p)
  • p(st =1 | st-1 =2) = q
  • p(st =2 | st-1 = 2) = (1-q)
Reformulate ARCH(q) equation to make the conditional variance dependent on st –i.e., the state of the economy.
  • • A parsimonious formulation:

For a SWARCH(1) with 2 states (1 and 2) we have 4 possible σt2:


• The parameter γst=1 is set to 1. Then, the parameter γst=2 is a relative volatility scale parameter. If γst=2 =3, then volatility in the state 2 is three times higher than in state 1.

• In SWARCH models, the states refer to the states of volatility. For a 2-state example, we have “high,” or “low” volatility states.

Since we have an unobservable variable, estimation is usually done with a variation of the Kalman filter model.

• Estimation of the model will estimate the volatility parameters, and the transition probabilities. As a byproduct of the estimation, we will also have an estimate for the latent variable –i.e., the “state.”


Integrated GARCH (IGARCH)

The standard GARCH model

is covariance stationary if

• But strict stationarity does not require such a stringent restriction (That is, that the unconditional variance does not depend on t).

If we allow α1 + β1 =1, we have the IGARCH model.

• In the IGARCH model the autoregressive polynomial in the ARMA representation has a unit root: a shock to the conditional variance is “persistent.”


Today’s variance remains important for future forecasts of all horizons.

• This is the Integrated GARCH model (IGARCH).

• Nelson (1990) establishes that, as this satisfies the requirement for strict stationarity, it is a well defined model.

• In practice, it is often found that α1 + β1 are close to 1.

• We may suspect that IGARCH is more a product of omitted structural breaks than the result of true IGARCH behavior. See Lamoreux and Lastrapes (1989) and Hamilton and Susmel (1994).


FIGARCH ModelBaillie, Bollerslev and Mikkelsen (1996), Journal of Econometrics.

• Recall the ARIMA(p,d,q) model (1-L)d α(Lp)yt=β(Lq)εt, where εt, is white noise.When d=1, we have yt is an Integrated process. In time series, it is usually assumed that d=1,2,…D.• But it can be any positive number, for example, 0<d<1. In this case, we have a fractionally integrated process, or ARFIMA. (See Granger and Joyeaux (1980).)•d is called the fractional integration parameter.

• When d €{-1/2,1/2}, the series is stationary and invertible. Hoskings (1981).


• Similar intuition carries to the GARCH(q,p) model.

• Recall the ARMA representation for GARCH process:

• Now, the FIGARCH process is defined as:

• When d=0, we have a GARCH(q,p), when d=1, we have IGARCH.

• This model captures long-run persistent (memory).

questions 1 lots of arch models which one to use 2 choice of p and q how many lags to use
• Questions1) Lots of ARCH models. Which one to use?2) Choice of p and q. How many lags to use?
  • • Hansen and Lunde (2004) compared lots of ARCH models:
  • - It turns out that the GARCH(1,1) is a great starting model.
  • - Add a leverage effect for financial series and it’s even better.

Estimation: MLE

All of these models can be estimated by maximum likelihood. First we need to construct the sample likelihood.

Since we are dealing with dependent variables, we use the conditioning trick to get the joint distribution:

Taking logs


Assuming normality, we maximize with respect to θ the function:

Example: ARCH(1) model.

Taking derivatives with respect to θ=(ω,α,γ), where γ=K mean pars:

note that the 0 k f o c s will give us gls denote s y t 0 s is the score vector
Note that the δϑ/δγ=0 (K f.o.c.’s) will give us GLS.Denote δϑ/δθ=S(yt,θ)=0 (S(.) is the score vector)
  • We have a (K+2xK+2) system. But, it is a non-linear system. We will
  • need to use numerical optimization.
  • • Gauss-Newton or BHHH can be easily implemented.
  • • Given the AR structure, we will need to make assumptions about σ0
  • (and ε0,ε1 , ..εp if we assume an AR(p) process for the mean).
  • Alternatively, we can take σ0 (and ε0,ε1 , ..εp) as parameters to be
  • estimated (it can be computationally more intensive and estimation
  • can lose power.)

Note: The appeal of MLE is the optimal properties of the resulting estimators under ideal conditions.Crowder (1976) gives one set of sufficient regularity conditions for the MLE in models with dependent observations to be consistent and asymptotically normally distributed. Verifying these regularity conditions is very difficult for general ARCH models - proof for special cases like GARCH(1,1) exists. For GARCH(1,1) model: if E(ln α1,zt2 +β1] < 0, the model is strictly stationary and ergodic. See Lumsdaine (1992).

if the conditional density is well specified and 0 belongs to then
If the conditional density is well specified and θ0 belongs to Ω, then
  • • Common practice in empirical studies: Assume the necessary regularity conditions are satisfied.

• Under the correct specification assumption, A0=B0, where

We estimate A0 and B0 by replacing θ0 by its estimated MLE value.

The estimator B0 has a computational advantage over A0.: Only first derivatives are needed. But A0=B0 only if the distribution is correctly

specified. This is very difficult to know in practice.


• Block-diagonality

In many applications of ARCH, the parameters can be partitioned into mean parameters, θ1, and variance parameters, θ2.

Then, δμt(θ)/δθ2=0 and, although, δσt(θ)/δθ1≠0, the Information matrix is block-diagonal (under general symmetric distributions for zt and for particular ARCH specifications).

Not a bad result:

- Regression can be consistently done with OLS.

- Asymptotically efficient estimates for the ARCH parameters can be obtained on the basis of the OLS residuals.

• But block diagonality can’t buy everything:

- Conventional OLS standard errors could be terrible.

- When testing for serial correlation, in the presence of ARCH, the conventional Bartlett s.e. – (1/n)-1- could seriously underestimate the true s.e.


Estimation: QMLE

• The assumption of conditional normality is difficult to justify in many empirical applications. But, it is convenient.

• The MLE based on the normal density may be given a quasi-maximum likelihood (QMLE) interpretation.

• If the conditional mean and variance functions are correctly specified, the normal quasi-score evaluated at θ0 has a martingale difference property:


Since this equation holds for any value of the true parameters, the QMLE, say θQMLE is Fisher-consistent –i.e., E[S(yT, yT-1,…y1 ; θ)] = 0 for any θ€Ω.


• The asymptotic distribution for the QMLE takes the form:

The covariance matrix (A0-1 B0 A0-1) is called “robust.” Robust to departures from “normality.”

• Bollerslev and Wooldridge (1992) study the finite sample distribution of the QMLE and the Wald statistics based on the robust covariance matrix estimator:

For symmetric departures from conditional normality, the QMLE is generally close to the exact MLE.

For non-symmetric conditional distributions both the asymptotic and the finite sample loss in efficiency may be large.


Estimation: GMM

• Suppose we have an ARCH(q). We need moment conditions:

Note: (1) refers to the conditional mean, (2) refers to the conditional variance, and (3) to the unconditional mean.

GMM objective function:



• γ has K free parameters; α has q free parameters. Then, we have a=K+q+1 parameters.

• m(θ;X,y) has r=k+m+2 equations.

• Dimensions: Q is 1x1; E[m(θ;X,y)] is rx1; W is rxr.

• Problem is over-identified: more equations than parameters so cannot solve E[m(θ;X,y)]=0, exactly.

• Choose a weighting matrix W for objective function and minimize using nonlinear solver (for example, optmum in GAUSS).

• Optimal weighting matrix: W =[E[m(θ;X,y)]E[m(θ;X,y)]’]-1.

• Var(θ)=(1/T)[DW-1D’]-1,

where D = δE[m(θ;X,y)]/δθ’. (all these expressions evaluated at θ^.)

testing white s 1980 general test for heteroskedasticity engle s 1982 tr 2 2 q
TestingWhite’s (1980) general test for heteroskedasticity.Engle’s (1982) TR2~χ2q
  • • In ARCH Models, testing as usual: LR, Wald, and LM tests.
  • Reliable inference from the LM, Wald and LR test statistics
  • generally does require moderately large sample sizes of at least two
  • hundred or more observations.
  • • Issues:
  • - Non-negative constraints must be imposed. θ0 is often on the
  • boundary of Ω. (Two sided tests may be conservative)
  • - Lack of identification of certain parameters under H0, creating a
  • singularity of the Information matrix under H0. For example, under
  • H0:α1=0 (No ARCH), in the GARCH(1,1), ω and β1 are not jointly
  • identified. See Davies (1977).

Ignoring ARCH

Hamilton, J.D. (2008), “Macroeconomics and ARCH, Working paper, UCSD.

• Many macroeconomic and financial time series have an AR structure. What happens when ARCH effects are ignored?

Assume yt = γ0 + γ1 yt-1 + εt , where εt follows a GARCH(1,1) model.

Then, ignoring ARCH:

Assume the 4th moment exists, standard consistency give us


For simplicity assume γ0=0. Then, T1/2γ is approximately N(0,1). But,

Under H0: No ARCH, the second summation is a MDS with variance

Using CLT:

To calculate the value of the variance, recall the ARMA(1,1) representation for GARCH(1,1) models:


For an ARMA(1,1):

Then, after some substitutions:

Note: V11 ≥1, with equality iff α1=0. OLS treats T1/2γ^ as N(0,1), but the true asymptotic distribution is N(0,V11). OLS tests reject more often. As α1 and β1 get closer to μ4=∞, we reject even more.


Figure 1. (From Hamilton (2008).) Asymptotic rejection probability for OLS t-test that autoregressive coefficient is zero as a function of GARCH(1,1) parameters α and δ. Note: null hypothesis is actually true and test has nominal size of 5%.


• If the ARCH parameters are in the usual range found in estimates of GARCH models, an OLS t-test with no correction for heteroskedasticity would spuriously reject with arbitrarily high probability for a sufficiently large sample.

  • • The good news is that the rate of divergence is slow:
  • it may take a lot of observations before the accumulated excess
  • kurtosis overwhelms the other factors.
  • The solid line in Figure 2 plots the fraction of samples for which an
  • OLS t test of γ1= 0 exceeds two in absolute value. Thinking we’re
  • only rejecting a true null hypothesis 5% of the time, we would do so
  • 15% of the time when T = 100 and 33% of the time when T = 1,000.
  • • White’s (1980) s.e. help. Newey-West’s (1987) s.e. help less.
  • • Engle’s TR2 is very good. Better than White’s (1980), as expected

Figure 2. From Hamilton (2008). Fraction of samples in which OLS t-test leads to rejection of the null hypothesis that autoregressive coefficient is zero as a function of the sample size for regression with Gaussian errors (solid line) and Student’s t errors (dashed line). Note: null hypothesis is

actually true and test has nominal size of 5%.



Engle, R.F., D. Lilien and R. Robins (1987), “Estimating Time Varying Risk Premia in the Term Structure: the ARCH-M Model,” Econometrica.

• Finance theory suggests that the mean of a relationship will be affected by the volatility or uncertainty of a series.

ARCH in mean (ARCH-M) framework:

The variance or the standard deviation are included in the mean relationship.

The difference from the previous models ARCH/GARCH models is that the volatility enters also in the mean of the return.
  • This is exactly what Merton’s (1973, 1980) ICAPM produces
  • risk-return tradeoff. It must be the case that δ> 0.

• Again, we have a Davies (1977)-type problem.

Let μt(θ)= μ +δσt(θ), with μ≠0 ,

δis only identified if the conditional variance is time-varying. Thus, a standard joint test for ARCH effects and δ= 0 is not feasible.

Note: Block-diagonality does not hold for the ARCH-M model. Consistent estimation requires correct specification of cond. mean and variance. (And simultaneous estimation.)


Non normality assumptions

The basic GARCH model allows a certain amount of leptokurtosis.

It is often insufficient to explain real world data.

Solution: Assume a distribution other than the normal which help to allow for the fat tails in the distribution.

• t Distribution - Bollerslev (1987)

The t distribution has a degrees of freedom parameter which allows greater kurtosis. The t likelihood function is

where Γis the gamma function and v is the degrees of freedom.

As υ→∞, this tends to the normal distribution.

• GED Distribution - Nelson (1991)


Multivariate ARCH ModelsEngle, R.F. and K.F. Kroner (1993), Multivariate Simultaneous Generalized ARCH, working paper, Department of Economics, UCSD.

  • It is common in Finance to stress the importance of covariance
  • terms. The above model can handle this if y is a vector and we
  • interpret the variance term as a complete covariance matrix. The
  • whole analysis carries over into a system framework:

• From an econometric theory point of view, multivariate ARCH models add no problems. The log likelihood assuming normality is:


• Several practical issues:

-A direct extension of the GARCH model would involve a very large

number of parameters (for 4 assets, we have to estimate 10 elements in Ωt).

-The conditional variance could easily become negative even when all the parameters are positive.

-The chosen parameterization should allow causality between variances.

- Covariances and Correlations: How to model them?


Vector ARCH

Let vech denote the matrix stacking operation

A general extension of the GARCH model would then be

W is vector with T(T+1)/2 elements, A(L) and B(L) are squared matrices with T(T+1)/2xT(T+1)/2 elements. Total parameters: T(T+1)/2 +T2 (T+1)2/2.

This quickly produces huge numbers of parameters, for p=q=1 and n=5 there are 465 parameters to estimate here.


• One simplification used is the Diagonal GARCH model where A and B are taken to be diagonal, but this assumes away causality in variances and co-persistence. We need still more restrictions to ensure positive definiteness in the covariance matrix.

• A more tractable alternative: the BEKK model

• V is a lower diagonal matrix with T(T+1)/2 parameters, A and B are squared matrices with N2 parameters each.

• BEKK guarantees p.d. for Σt, since it works with quadratic forms.

• We can further reduce the parameterization by making A and B diagonal.


Factor ARCH

Suppose a vector of N series has a common factor structure. Such as:

where ξ are the common factors and

then the conditional covariance matrix of y is given by

If Λt is diagonal with elements λkt or if the off-diagonal elements are constant andcombined into Ψ, then, the model may be written as


So given a set of factors we may estimate a parsimonious model for the covariance matrix once we have parameterized λk.

• One assumption is that we observe a set of factors which cause the variance, then we can simply use these. For example, “the market,” liquidity, interest rates, exchange rates, etc.

• Diebold and Nerlove (1989) use a factor ARCH structure, but with λk as a latent variable. (Estimation: Kalman filter.)

• Another common assumption is that each factor has a univariate GARCH representation.

• Application of Factor ARCH: Common Factors. (Engle and Kozicki (1993), Engle and Susmel (1993).

realized volatility rv models
Realized Volatility (RV) Models
  • French, Schwert and Stambaugh’s (1987) use higher frequency
  • to estimate the variance as:

where rt is realized returns in days, and we estimate monthly variance.

• Model-free measure –i.e., no need for ARCH-family specifications.

• This method is used a lot for intra-daily data, called high frequency (HF) data.

• Very popular to calculate intra-day or daily volatility. For example, based on TAQ data, say, 1’ or 10’ realized returns we can calculate the daily variance, or realized volatility, RVt:


Where rt,j is jth interval return on day t. That is, RV is defined as the sum of intraday returns.

• We can use time series models –say an ARIMA- for RVt to forecast daily volatility.

• RV is affected by microstructure effects: bid-ask bounce, infrequent trading, calendar effects, etc.. For example, the bid-ask bounce induces serial correlation in intra-day returns, which biases RVt. (Big problem!)

-Proposed Solution: filter the intra-day returns using MA or AR models before constructing RV measures.


• Under some conditions (bounded kurtosis and 1 autocorrelation of squared returns less than 1), RVt is consistent and m.s. convergent.

• Realized volatility is a measure. It has a distribution.

• For returns, the distribution of RV is non-normal (as expected). It tends to be skewed right and leptokurtic. For log returns, the distribution is approximately normal.

• Daily returns standardized by RV measures are nearly Gaussian.

• RV is highly persistent.

• The key problem is the choice of sampling frequency (or number of observations per day).

— Bandi and Russell (2003) propose a data-based method for choosing frequency that minimizes the MSE of the measurement error.

— Simulations and empirical examples suggest optimal sampling is around 1-3 minutes for equity returns.


• Another method: AR model for volatility:

The εt are estimated from a first step procedure -i.e., a regression.

Make sure that the estimates are positive.

• The Parkinson’s (1980) estimator:

s2t={Σt (ln(Ht)-ln(Lt)2 /(4ln(2)T)},

where Ht is the highest price and Lt is the lowest price.

• There is an RV counterpart, using HF data: Realized Range (RR):

RRt={Σj 100x(ln(Ht,j)-ln(Lt,j)2 /(4ln(2)},

where Ht,j and Lt,j are the highest and lowest price in the jth interval.

• These “range estimators are very good and very efficient.

Reference: Christensen and Podolskij (2005).

stochastic volatility sv svol models
Stochastic volatility (SV/SVOL) models

Jacquier, E., Polson, N., Rossi, P. (1994), Bayesian analysis of stochastic volatility models, Journal of Business and Economic Statistics. (Estimation)

Heston, S.L. (1993), A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Stu. (Theory)

  • • The difference with ARCH models: The shocks that govern the volatility are not necessarily εt’s.

Or using logs:

• We have 3 SVOL parameters to estimate: φ=(ω,β,σv).


• This is really a discretization of a continuous-time model, where the

mean and the variance follow two OU processes.

• SVOL models can be estimated by MLE, QMLE or other methods. In general, Bayesian methods (Gibbs sampling, MCMC Models).

• Brief Review: Bayesian Estimation

Idea: We are not estimating a parameter value (θ), but rather updating and sharpening our subjective beliefs about θ.

• The centerpiece of the Bayesian methodology is Bayes’ theorem:

P(A|B) = P(A ∩B)/P(B) = P(B|A) P(A)/P(B).

Think of B as “something known” –for example, the data- and A as “something unknown” –e.g., the coefficients of a model.

• Our interest: Value of the parameters (θ), given the data (y).


• We can write:

P(θ|y) = P(y|θ) P(θ)/P(y) (“Bayesian learning”)

• For estimation, we can ignore the term P(y), since the data do not depend on the parameters. Then, we can write:

P(θ|y) ∞ P(y|θ) P(θ)

• Terminology:

- P(y|θ) : Density of the data, given the parameters (“likelihood function”).

- P(θ): Prior density of the parameters. Prior belief of the researcher.

- P(θ|y): Posterior density of the parameters, given the data. (A mixture of the prior and the “current information” from the data.)

Note: Posterior is proportional to likelihood times prior.

• Prior information is a controversial aspect of Bayesian econometrics since it sounds unscientific. Where do they come from?


• Priors can have any form. However, it is common to choose particular classes of priors which are easy to interpret and/or make computation easier.

• Conjugate prior: prior and posterior both have same class of distributions.

• Prior can be interpreted as arising from an imaginary data set from the same process which generated the actual data.

• Prior and the likelihood are needed to get the posterior.

• Once we get more data, the posterior becomes the prior and we update again.

• The calculations involved in Bayesian analysis can be burdensome.


Example: Linear Model y = Xβ + ε

- Suppose that the data is normal –i.e., f(y|β,σ,X) = N(Xβ,σ2I).

- X’s are fixed.

- Assume β|σ2~ N(m,σ2A).

- Assume σ is known (to simplify).

Note:m represents best guess for β, before seeing y and X. A represents the confidence in the guess.

Recall that we can write

y - Xβ= (y – Xb) - X(β- b) (b = (X’X)-1X’y)

(y–Xβ)’(y–Xβ) = (y–Xb)’(y–Xb)+(β- b)’X’X(β- b)-2(β- b)’X’ (y– Xβ)

=υs2+(β- b)’X’X(β- b)

wheres2 = SSE = (y–Xb)’(y–Xb)/(T-k); and υ=(T-k).


• The likelihood can be written as:

The likelihood can be written as a product of a normal and a density of form f(θ) = κθ-λ exp{-λ/θ}.

This is called an inverted gamma (inverse of a χ2) distribution.

Note: Bayesians work with h = 1/ σ2, which they called “precision.” A gamma prior is usually assumed for h.

• Then, the posterior is:


where m*= (A-1+X’X)-1(A-1 m + X’y)

(See Hamilton (1994), Chapter 12)

• In other words, the pdf of β, conditioning on the data, is normal with mean m* and variance matrix σ2 (X’X+A-1)-1.

• Note I: If we have a large variance A (a “diffuse prior”), our prior, m, will have a lower weight. As A→∞, m* →(X’X)-1X’y (OLS!)

• We can do the similar calculations when we impose another prior on σ. But, the results would change.

• Note II: We had to specify the priors and the distribution of the data. If we change any of these two assumptions, the results would change.

We get exact results, because we made distributional assumptions on the data.


SVOL Estimation is based on the idea of hierarchical structure:

- f(y|σt2) (distribution of the data given the volatilities)

- f(σt2|φ) (distribution of the volatilities given the parameters)

- f(φ) (distribution of the parameters)

Goal: To estimate the joint f(σt2,φ|y) (“posterior”)

Priors (Beliefs):

Normal-Gamma for f(φ). (Standard Bayesian regression model)

Inverse-Gamma for f(σv) (exp(-vs02/2σ)/σv+1).

Normals for ω,β.

Impose (assume) stationarity of σt2. (Truncate β as necessary)

Algorithm: MCMC (JPR (1994).)

Augment the parameter space to include σt2.

Using a proper prior for f(σt2,φ) the MCMC provides inference about the joint posterior f(σt2,φ|y).

Classic reference: Andersen (1994), Mathematical Finance.

Application to interest rates: Kalimipalli and Susmel (2004), JEF.