David Makowski and Daniel Wallach INRA, France

January 2007 Bayesian methods for parameter estimation and data assimilation with crop modelsPart 4: The Metropolis-Hastings algorithm David Makowski and Daniel Wallach INRA, France

Previously Bayes’ Theorem • Approximation of posterior distribution with the Importance Sampling algorithm. • Implementation with the R statistical software. • Application to estimate one parameter.

Objectives of part 4 • Present another algorithm to approximate the posterior probability distribution, the Metropolis-Hastings algorithm. • Illustrate with 2 examples. The first has 1 parameter, the second 3 parameters. • Furnish a program in the R language that you can run to implement Metropolis-Hastings for the examples. • R is free (see http://www.r-project.org/.)

Two approaches for approximating posterior distributions from Monte Carlo simulations • 1. Non adaptative methods • All parameter vectors can be generated at the start of the procedure. The choice of parameters to be tested does not depend on the results for previous parameters. • Example: Importance Sampling (see part 3). • 2. Markov chain Monte Carlo methods (MCMC) • Parameter values are generated from a Markov chain. The parameter value to be tested at stage i+1 can depend on the parameter value at stage i. • The most important methods are the Metropolis-Hastings algorithm and Gibbs sampling.

The Metropolis-Hastings algorithm General case Step 0. Choose a starting value θ1. Define a proposal distrubution Pp(θc|θi). (For example, use a normal distribution with mean equal to θi ). Repeat steps 1-3 for i=1,…,N Step 1. Generate a candidate parameter value θc from Pp ( θc|θi ). Step 2. Calculate Step 3. If T1, then θi+1 = θc . If T<1, then draw u from a uniform distribution on the interval (0, 1). If u<T then θi+1 = θc otherwise θi+1 = θi . The result of the algorithm is a list of N parameter values. The same value may be repeated several times.

The Metropolis-Hastings algorithm with symmetric proposal distribution • A common choice for the proposal distribution P(θc|θi) is a normal distribution with mean equal to θi and constant variance. • In this case P( θc|θi) = P(θi|θc ) and the expression for T simplifies: Likelihood Prior density

The Metropolis-Hastings algorithm Choices to me made • The proposal distribution. The number of iterations required for a good approximation to the posterior distribution depends on this choice. • The number of iterations. A large number is generally necessary (N=10000, 100000 or more depending on the problem). • The number of parameter values in the list to be discarded (to reduce dependence on the value chosen for starting the algorithm). • We will give some suggestions for these choices with example 1.

Example 1 Example already presented in parts 2 and 3: Estimation of crop yield. • The single unknown parameter is the yield of a particular field. The prior information is an expert’s opinion. There is also information from a measurement. Both the prior density and the likelihood are normal distributions. • In this case, the exact expression of the posterior distribution is known. • This example is used to show that the Metropolis-Hastings method can give a good approximation of the posterior distribution.

Example 1 – Exact posterior distribution • From part 2, we have: • Measurement: Y=9 t/ha (sd=1) • Prior distribution: P(θ) = N(5, 2²) • Likelihood: P(Y|θ) = N(θ, 1) • Exact posterior distribution: P(θ |Y) = N(8.2, 0.8²) Posterior probability distribution Prior probability distribution Likelihood function

Example 1 – Metropolis-Hastings Step 0. Chooseθ1=5t/ha. As proposal distribution use a normal distribution: P( θc|θi ) = N( θi , 0.8²). Repeat steps 1-3 for i=1,…,N Step 1. Generate a candidate parameter value θc from P( θc|θi ). Step 2. Calculate with and Step 3. If u<min (1, T), where u is drawn from a uniform distribution on the interval (0, 1) then θi+1 = θc otherwise θi+1 = θi .

Example 1 - Results N = 500. Chain 1. Chain of parameter values True posterior distribution Posterior distribution approximated from the last 250 values Measurement

Example 1 - Results N = 500. Chain 2. Chain of parameter values Posterior distribution approximated from the last 250 values True posterior distribution Measurement

Example 1 – Running the R program • Install the file « MHyield.txt » on your computer. Note the path. This file has the R program that does the calculations. • Open R. • You will use the “source” command to run the program: • The command is given as a comment in the first line of the program. • In my case, I had to type: source("c:\\David\\Enseignements\\Cours ICASA\\MHyield.txt"). • You must replace the path name by your path name. • Copy and paste the corrected command (without the “#” character) in the Commands window of R. • Press RETURN to execute. • You can easily change the value of N, the measurement value, its accuracy… See comments in my R function MHyield.

Example 2 Estimation of the three parameters of a model of yield response to fertilizer. • Non linear model relating wheat yield to nitrogen fertilizer dose. • Yield = θ1 + θ2 (Dose – θ3) if Dose < θ3 • Yield = θ1 if Dose ≥θ3 • Objective: Estimation of the 3 parameters for a given wheat field. θ1 θ2 Dose 0 θ3

Example 2 – Prior distribution • The prior distribution of the parameters was defined in a previous study (Makowski and Lavielle. 2006. JABES 11, 45-60). • It represents the between-field variability of the parameter values in a region (bassin of Paris). • P(θ1) = N(9.18, 1.16²) t/ha (maximal yield value) • P(θ2) = N(0.026, 0.0065²) t/kg N (slope of the linear part) • P(θ1) = N(123.85, 46.7²) kg N /ha (N dose threshold) • The prior means define the « average » response curve in the region of interest.

Example 2 - Data Data collected in a new wheat plot in the same region. Four yield measurements obtained in this plot for four different N doses. Tested doses: 0, 50, 100, and 200 kg/ha. Corresponding yield measurements in the plot: 2.50, 5.01, 7.45, and 7.51 t/ha.

Example 2 - Likelihood with f(D; θ) is the linear-plus-plateau response function (D = N dose). σ was estimated in a previous study and is set equal to 0.3.

Example 2 – Results with N=50000 – Chain 1. Chains of parameter values

Example 2 – Results with N=50000 – Chain 1. Curve based on prior means Curve based on posterior means

Example 2 – Results with N=50000 – Chain 2. Chains of parameter values

Example 2 – Results with N=50000 – Chain 2. Curve based on prior means Curve based on posterior means

Example 2 – Running the R program • The R program is in the file MHresponse.txt. • To run this function yourself, follow the previous instructions • Type « Return » after the first series of graphs to obtain the second series.

Conclusion Importance sampling versus MCMC • Both methods can be used to approximate the posterior distribution of parameters for any model. • Both methods require the definition of a proposal distribution to generate parameter values. Not easy in practice. • The comparison of the two types of methods is an active area of reasearch. • MCMC methods (Gibbs sampling and MH) can be easily implemented with the WinBUGS software. See http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml.

David Makowski and Daniel Wallach INRA, France

David Makowski and Daniel Wallach INRA, France

Presentation Transcript

L. Buéno Unité de Neurogastroentérologie INRA Toulouse, France

Alex, Alli, David, Daniel, Melinda

Andrea Wallach Professional Development Specialist

Emmanuelle CHEVASSUS-LOZZA (INRA, Nantes - France) Jacques GALLEZOT (INRA, Paris - France)

Daniel LEVY (EDF, France)

Tanguy Daufresne INRA-CEFS (dept. of wildlife behavior and ecology) Toulouse, France

Marc BARBIER ( INRA SADAPT - France ) et Valérie WEBER-HADDAD ( APCA Paris - France )

VALERI Meeting 10 March 2005, INRA, Avignon, France

L. Buéno Unité de Neurogastroentérologie INRA Toulouse, France

Daniel Dhumeaux, Henri Mondor hospital Créteil, France

1 INRA-URH, Research Centre of Clermont Fd/Theix, France,

David C. Wong and Daniel Tong *

Journées INRA - INRIA

Daniel David Brauer - Electrician

David Makowski and Daniel Wallach INRA, France

WP 2 – P6 (INRA-France)

Adrian Ares, David K. Brauer, David M. Burner and Daniel H. Pote