CHAPTER 16 M ARKOV C HAIN M ONTE C ARLO

Slides for Introduction to Stochastic Search and Optimization (ISSO)by J. C. Spall CHAPTER 16MARKOV CHAINMONTECARLO Organization of chapter in ISSO Background on MCMC Metropolis-Hastings algorithm Numerical example of Metropolis-Hastings Gibbs sampling Numerical example of Gibbs sampling Optional in these slides: Non-Gaussian state estimation (not in ISSO)

Background • Process generating random vector X, • Want to compute E([f(X)] for function f() • Standard method for approximating E([f(X)] is to generate many independent sample values of X and compute sample mean of f(X) values • Only useful in “trivial” cases where X can be generated directly • Many practical problems have non-trivial distribution for X • E.g., state in nonlinear/non-Gaussian state-space model

Markov Chains • Not necessary to generate independent X to estimate E([f(X)] • Consider dependentsequence X0, X1, X2,… • Generate Xk+1 according to “easy” conditional distribution for {Xk+1|Xk} • {Xk} process is a Markov chain • Xk dependence on fixed number of early states disappears as k gets large • Above implies distribution of Xk approaches a stationary form as k gets large • Stationary form corresponds to target distribution (density) p(·) if conditional distribution chosen properly

Ergodic Averaging • Let M denote the “burn-in” period for the Markov chain • The ergodic average of n – M values of f(X) with Xkgenerated via a Markov chain is • Summands above are dependent via the Markov property for the {Xk} • Above sum approaches E([f(X)] as n gets large by ergodic theorem

Metropolis-Hastings (M-H) Algorithm • M-H algorithm is one of two most popular forms for MCMC (other is Gibbs sampling) • M-H relies on proposal distribution and Metropolis criterion • Let proposal distribution be q(·|·); used to generate candidate pointsW ~q(·|X=x) • Candidate point W = w is accepted with probability given by Metropolis criterion: • In practice, in going from Xk toXk+1, x above is Xk and W becomes Xk+1 if W is accepted

M-H Algorithm for Estimating E([f(X)]) Step 0 (initialization)Choose length of “burn-in” period M and initial state X0. Set k = 0. Step 1 (candidate point) Generate a candidate point W according to proposal distribution q(|Xk). Step 2 (accept/reject) Generate point U from U(0,1) distribution. Set Xk+1 = W if U(Xk,W) (Metropolis criterion). Otherwise set Xk+1 = Xk. Step 3 (iterate) Repeat Steps 1 and 2 until XM is available. Terminate “burn-in” process and proceed to step 4 with Xk = XM. Steps 4–6 (ergodic average)Repeat process and compute average of f(XM+1),…, f(Xn). This ergodic average is estimate of E([f(X)].

Example: Estimating E([f(X)]) from a Bivariate Normal Distribution(Example 16.1 from ISSO) • Suppose • Use M-H to estimate sum of the two mean components (true value = 0):f(X) = [1,1]X • Standard (unit length) uniform proposal distribution and burn-in period of M = 500 • Following plot shows three independent runs • Acceptance rate (Metropolis criterion) about 70% • Better performance possible with lower acceptance rate (requires “tuning”—not always feasible in practice)

Example (cont’d): M-H Algorithm with Uniform Proposal Distribution; Mean Zero Target 3 2 1 0 –1 –2 –3 0 5000 10000 15000 Iterations (Post Burn-In)

Gibbs Sampling • Gibbs sampling is implementation of M-H on element-by-element basis • Gibbs sampling uniquely designed for multivariate problems, i.e., dim(X)  2 • Gibbs sampling based on idea of “full conditional” distributions • ith full conditional distribution is conditional distribution for ith component of X conditioned on most recent values of all other components of X • In contrast to M-H, Gibbs sampling updates components of X one-at-a-time

Relationship of Gibbs Sampling to M-H • Gibbs sampling is special case of M-H on element-by-element basis • Gibbs sampling and M-H developed largely independent of each other • M-H introduced in Hastings (1970) as implementation of Metropolis sampling from statistical physics • Gibbs introduced in Tanner and Wong (1987) and Gelfand and Smith (1990), with special focus on Bayesian problems • Gibbs sampling uses particular form of full conditionals as proposal distribution from M-H • Eliminates need to “tune” proposal distribution as in general M-H • Requires stronger assumptions to construct full conditionals • Acceptance rate for new points is 100%

Example: Truncated Exponential Distribution (Example 16.5 from ISSO) • Consider two-variable problem where conditional random variables {X|Y} and {Y|X} have exponential distributions over finite interval (length = 5) • Distributions for {X|Y} and {Y|X} are two full conditionals for Gibbs sampling • Suppose interested in marginal distribution for X • Can determine exact marginal distribution for X • Useful for comparison purposes; not usually available in practice • Plot shows Gibbs output relative to true density for X • Histogram based on terminal X value from 5000 independent replications • Burn-in period of M = 10; terminal value occurs 30 iterations past burn-in

Example (cont’d): Histogram of Gibbs Sampling Output vs. Known Density

Optional (not in ISSO): Non-Gaussian State Estimation • Consider state-space model with non-Gaussian noises (xk is state; zk is measurement) • Represent p(xk|xk–1) and p(zk|xk) as Gaussian mixtures • Gaussian mixtures can be used to approx. many non-Gaussian distributions • Gibbs sampling used to estimate state based on Gaussian full conditionals • Further information on pp. 4344 of: Spall, J. C. (2003), “Estimation via Markov Chain Monte Carlo,” IEEE Control Systems Magazine, vol. 23(2), pp. 3445

Non-Gaussian State Estimation: Basic Idea • Let  represent parameters in Gaussian mixture • Xn and Zn are complete collection of all (n) states and measurements • Gibbs sampling operates from full conditionals: {xk| xk–1, , Zn}— Gaussian distribution { | Xn, Zn}— non-Gaussian distribution • Above non-Gaussian distribution known for many cases • Iterative sampling from above full conditionals produces samples fromp(xk|Zn) for all k Average the samples to getE(xk|Zn)

Concluding Remarks • M-H and Gibbs sampling two notable examples of MCMC • Methods for “easy” generation of random samples and estimates • M-H more general, but Gibbs especially useful in specific applications • Not “magic”—still need relevant assumptions • Widespread use in statistics, computer science, simulation, etc. • Limited current use in control and signal processing • But non-Gaussian/nonlinear state estimation one growing area

Exercise 16.3: Four replications of M-H Algorithm with Mean Zero Target

Exercise 16.8: Histogram (2000 samples) and Known Density for X

CHAPTER 16 M ARKOV C HAIN M ONTE C ARLO

CHAPTER 16 M ARKOV C HAIN M ONTE C ARLO

Presentation Transcript

C M O C

C M S

Data-Driven M arkov C hain M onte C arlo

M. C. Escher

M – C , M = C , M ≡ C

Apple – Supply C hain M anagement

H all A M onte C arlo -- HAMC

HAMC = H all A M onte C arlo

M  C L C D C M L/D 18 - 16 0.35233 1.47142 -0.03308 0.23945

C#m

CHAPTER 16 M ARKOV C HAIN M ONTE C ARLO

Analysis of M/M/1, M/M/c, and M/M/c/c

M. C. Escher

M onte C arlo L ocalization for Mobile Robots

H idden M arkov M odels