1 / 42

Optimization in R: algorithms, sequencing, and automatic differentiation

Optimization in R: algorithms, sequencing, and automatic differentiation. James Thorson Aug. 26, 2011. Themes. Basic: Algorithms Settings Starting location Intermediate Sequenced optimization Phasing Parameterization Standard errors Advanced Derivatives. Outline.

serena
Download Presentation

Optimization in R: algorithms, sequencing, and automatic differentiation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization in R: algorithms, sequencing, and automatic differentiation James Thorson Aug. 26, 2011

  2. Themes Basic: Algorithms Settings Starting location Intermediate Sequenced optimization Phasing Parameterization Standard errors Advanced Derivatives

  3. Outline • One-dimensional • Two-dimensional • Using derivatives

  4. One-Dimensional

  5. Basic: Algorithm • Characterists • Very fast • Somewhat unstable • Process • Starts with 2 points • Moves in direction of higher point • Then goes between two highest points optimize(fn =, interval =, ...)

  6. Basic: Algorithm

  7. Basic: Algorithm

  8. Intermediate: Sequenced Sequencing: • Using a stable but slow method • Then using a fast method for fine-tuning One-dimensional sequencing • Grid-search • Then use optimize()

  9. Intermediate: Sequenced

  10. Basic: Algorithms Other one-dimensional functions • uniroot – Finds where f(∙) = 0 • polyroot – Finds all solutions to f(∙) = 0

  11. Two-Dimensional

  12. Basic: Settings • trace = 1 • Means different things for different optimization routines • In general, gives output during optimization • Useful for diagnostics optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

  13. Basic: Settings optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

  14. Basic: Settings • follow.on = TRUE • Starts subsequent methods at last stopping point • method = c(“nlminb”,”L-BFGS-U”) • Lists the set and order of methods to use optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”)) calcMin() in “PBSmodelling” package

  15. Basic: Settings Contraints • Unbounded • Bounded • I recommend using bounds • Box-constraints are common • Non-box constraints • Usually implemented in the objective function

  16. Basic: Algorithms Differences among algorithms: • Speed vs. accuracy • Unbounded vs. bounded • Can use derivatives

  17. Basic: Algorithms Nelder-Mead (a.k.a. “Simplex”) • Characteristics • Bounded (nlminb) • Unbounded (optimx) • Cannot use derivatives • Slow and but good at following valleys • Easily stuck at local minima

  18. Basic: Algorithms Nelder-Mead (a.k.a. “Simplex”) • Process • Uses a polygon with n+1 vertices • Take worst point and rotate across center • If worse: shrink • If better: Accept and expand along axis

  19. Basic: Algorithms

  20. Basic: Algorithms Rosenbrock “Banana” Function

  21. Basic: Algorithms Quasi-Newton (“BFGS”) • Characteristics • Bounded (optim, method=“BFGS”) • Unbounded (optim, method=“L-BFGS-U”) • Can use derivatives • Fast and less accurate

  22. Basic: Algorithms Quasi-Newton (“BFGS”) • Process • Approximates gradient and Hessian • Uses Newton’s method to update location • Uses various other methods to update gradient and Hessian

  23. Basic: Algorithms

  24. Basic: Algorithms Quasi-Newton (“ucminf”) • Different variation on quasi-Newton

  25. Basic: Algorithms

  26. Basic: Algorithms Conjugate gradient • Characteristics: • Bounded (optim) • Very fast for near-quadratic problems • Low memory • Highly unstable generally • I don’t recommend it for general usage

  27. Basic: Algorithms Conjugate gradient • Process • Numerical calculation of derivatives • Subsequent derivatives are “conjugate” (i.e. form an optimal linear basis for a quadratic problem)

  28. Basic: Algorithms

  29. Basic: Algorithms Many others! As one example…. Spectral project gradient • Characterististics • ??? • Process • ???

  30. Basic: Algorithms

  31. Basic: Algorithms Accuracy trials

  32. Basic: starting location It’s important to provide a good starting location! • Some methods (like nlminb) find the nearest local minimum • Speeds convergence

  33. Intermediate: Parameterization Suggestions: • All parameters on a similar scale • Derivatives are approximately equal • One method: use exp() and plogit() for inputs • Minimize covariance • Minimize changes in scale or covariance

  34. Intermediate: Phasing Phasing • Estimate some parameters (with others fixed) in a first phase • Estimate more parameters in each phase • Eventually estimate all parameters Uses • Multi-species models • Estimate with linkages in later phases • Statistical catch-at-age • Estimate scale early

  35. Intermediate: Standard errors Maximum likelihood allows asymptotic estimates of standard errors • Calculate Hessian matrix at maximum likelihood estimate • Second derivatives of Log-Likelihood function • Invert the Hessian • Diagonal entries are variances • Square root is standard error

  36. Intermediate: Standard errors Calculation of Hessian depends on parameter transformations • When using exp() or logit() transformations, use the delta-method to transform back to normal space

  37. Intermediate: Standard errors

  38. Intermediate: Standard errors Gill and King (2004) “What to do when your Hessian is not invertible” gchol() – Generalized Cholesky (“kinship”) ginv() – Moore-Penrose Inverse (“MASS”)

  39. Intermediate: Standard errors [ Switch over to R-screen to show mle() and solve(hess()) ]

  40. Advanced: Differentiation Gradient: • Quasi-newton • Conjugate gradient Hessian: • Quasi-newton optimx(par = , fn = , gr=, hess=, lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

  41. Advanced: Differentiation Automatic differentiation • AD Model Builder • “radx” package (still in development) Semi-Automatic differentiation • “Rsympy” package Symbolic differentiation • “deriv” BUT: None of these handle loops or “sum/prod” so they’re not really helpful for statistics yet

  42. Advanced: Differentiation Mixture distribution model (~ 15 params) • 10 seconds in R • 2 seconds in ADMB Multispecies catchability model (~ 150 params) • 4 hours in R (using trapezoid method) • 5 minutes in ADMB (using MCMC) Surplus production meta-analysis (~ 750 coefs) • 7 days in R (using trapezoid method) • 2 hours in ADMB (using trapezoid method)

More Related