Optimization in R: algorithms, sequencing, and automatic differentiation

Optimization in R: algorithms, sequencing, and automatic differentiation James Thorson Aug. 26, 2011

Themes Basic: Algorithms Settings Starting location Intermediate Sequenced optimization Phasing Parameterization Standard errors Advanced Derivatives

Outline • One-dimensional • Two-dimensional • Using derivatives

One-Dimensional

Basic: Algorithm • Characterists • Very fast • Somewhat unstable • Process • Starts with 2 points • Moves in direction of higher point • Then goes between two highest points optimize(fn =, interval =, ...)

Basic: Algorithm

Intermediate: Sequenced Sequencing: • Using a stable but slow method • Then using a fast method for fine-tuning One-dimensional sequencing • Grid-search • Then use optimize()

Intermediate: Sequenced

Basic: Algorithms Other one-dimensional functions • uniroot – Finds where f(∙) = 0 • polyroot – Finds all solutions to f(∙) = 0

Two-Dimensional

Basic: Settings • trace = 1 • Means different things for different optimization routines • In general, gives output during optimization • Useful for diagnostics optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

Basic: Settings optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

Basic: Settings • follow.on = TRUE • Starts subsequent methods at last stopping point • method = c(“nlminb”,”L-BFGS-U”) • Lists the set and order of methods to use optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”)) calcMin() in “PBSmodelling” package

Basic: Settings Contraints • Unbounded • Bounded • I recommend using bounds • Box-constraints are common • Non-box constraints • Usually implemented in the objective function

Basic: Algorithms Differences among algorithms: • Speed vs. accuracy • Unbounded vs. bounded • Can use derivatives

Basic: Algorithms Nelder-Mead (a.k.a. “Simplex”) • Characteristics • Bounded (nlminb) • Unbounded (optimx) • Cannot use derivatives • Slow and but good at following valleys • Easily stuck at local minima

Basic: Algorithms Nelder-Mead (a.k.a. “Simplex”) • Process • Uses a polygon with n+1 vertices • Take worst point and rotate across center • If worse: shrink • If better: Accept and expand along axis

Basic: Algorithms

Basic: Algorithms Rosenbrock “Banana” Function

Basic: Algorithms Quasi-Newton (“BFGS”) • Characteristics • Bounded (optim, method=“BFGS”) • Unbounded (optim, method=“L-BFGS-U”) • Can use derivatives • Fast and less accurate

Basic: Algorithms Quasi-Newton (“BFGS”) • Process • Approximates gradient and Hessian • Uses Newton’s method to update location • Uses various other methods to update gradient and Hessian

Basic: Algorithms Quasi-Newton (“ucminf”) • Different variation on quasi-Newton

Basic: Algorithms Conjugate gradient • Characteristics: • Bounded (optim) • Very fast for near-quadratic problems • Low memory • Highly unstable generally • I don’t recommend it for general usage

Basic: Algorithms Conjugate gradient • Process • Numerical calculation of derivatives • Subsequent derivatives are “conjugate” (i.e. form an optimal linear basis for a quadratic problem)

Basic: Algorithms Many others! As one example…. Spectral project gradient • Characterististics • ??? • Process • ???

Basic: Algorithms Accuracy trials

Basic: starting location It’s important to provide a good starting location! • Some methods (like nlminb) find the nearest local minimum • Speeds convergence

Intermediate: Parameterization Suggestions: • All parameters on a similar scale • Derivatives are approximately equal • One method: use exp() and plogit() for inputs • Minimize covariance • Minimize changes in scale or covariance

Intermediate: Phasing Phasing • Estimate some parameters (with others fixed) in a first phase • Estimate more parameters in each phase • Eventually estimate all parameters Uses • Multi-species models • Estimate with linkages in later phases • Statistical catch-at-age • Estimate scale early

Intermediate: Standard errors Maximum likelihood allows asymptotic estimates of standard errors • Calculate Hessian matrix at maximum likelihood estimate • Second derivatives of Log-Likelihood function • Invert the Hessian • Diagonal entries are variances • Square root is standard error

Intermediate: Standard errors Calculation of Hessian depends on parameter transformations • When using exp() or logit() transformations, use the delta-method to transform back to normal space

Intermediate: Standard errors

Intermediate: Standard errors Gill and King (2004) “What to do when your Hessian is not invertible” gchol() – Generalized Cholesky (“kinship”) ginv() – Moore-Penrose Inverse (“MASS”)

Intermediate: Standard errors [ Switch over to R-screen to show mle() and solve(hess()) ]

Advanced: Differentiation Gradient: • Quasi-newton • Conjugate gradient Hessian: • Quasi-newton optimx(par = , fn = , gr=, hess=, lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

Advanced: Differentiation Automatic differentiation • AD Model Builder • “radx” package (still in development) Semi-Automatic differentiation • “Rsympy” package Symbolic differentiation • “deriv” BUT: None of these handle loops or “sum/prod” so they’re not really helpful for statistics yet

Advanced: Differentiation Mixture distribution model (~ 15 params) • 10 seconds in R • 2 seconds in ADMB Multispecies catchability model (~ 150 params) • 4 hours in R (using trapezoid method) • 5 minutes in ADMB (using MCMC) Surplus production meta-analysis (~ 750 coefs) • 7 days in R (using trapezoid method) • 2 hours in ADMB (using trapezoid method)

Optimization in R: algorithms, sequencing, and automatic differentiation

Optimization in R: algorithms, sequencing, and automatic differentiation

Presentation Transcript

Graph Algorithms Using Depth First Search

DNA sequencing: methods

T-Cell Maturation, Activation, and Differentiation

Instructional Sequencing

Heuristic Optimization Methods

Automatic Text Summarization

Analysis of Algorithms

Automatic optimization of MapReduce Programs Michael Cafarella , Eaman Jahani , Christopher Re

Greedy Algorithms

Discrete Optimization Lecture 3 – Part 2

Algorithms

Lecture 11: Sequential Circuit Design

CESA #11 Leading for Differentiation April 6, 2010

Chapter 20

Genetic Algorithms

Coaching for Differentiation By Lori Comallie-Caplan

Algorithms -- What we’ll do

Genetic Algorithms

CPSC 411 Design and Analysis of Algorithms

NUMERICAL DIFFERENTIATION AND INTEGRATION

DNA sequencing “the technology lecture”

Approximation Algorithms for Stochastic Combinatorial Optimization