Nonlinear Regression Parameter Estimation Methods & Iteration Convergence Criteria

CHEE801 Module 5: Nonlinear Regression J. McLellan

Outline - Single response • Notation • Assumptions • Least Squares Estimation – Gauss-Newton Iteration, convergence criteria, numerical optimization • Diagnostics • Properties of Estimators and Inference • Other estimation formulations – maximum likelihood and Bayesian estimators • Dealing with differential equation models • And then on to multi-response… J. McLellan

Notation random noise component Model: Model specification – • the model equation is • with n experimental runs, we have • defines the expectation surface • the nonlinear regression model is • Model specification involves form of equationand parameterization explanatory variables – ith run conditions p-dimensional vector of parameters J. McLellan

Parameter Estimation – Linear Regression Case approximating observation vector residual vector observations Expectation surface J. McLellan

Parameter Estimation - Nonlinear Regression Case approximating observation vector residual vector observations expectation surface J. McLellan

Parameter Estimation – Gauss-Newton Iteration Least squares estimation – minimize Iterative procedure consisting of: • Linearization about the current estimate of the parameters • Solution of the linear(ized) regression problem to obtain the next parameter estimate • Iteration until a convergence criterion is satisfied J. McLellan

Linearization about a nominal parameter vector Linearize the expectation function η(θ) in terms of the parameter vectorθ about a nominal vector θ0: • Sensitivity Matrix • Jacobian of the expectationfunction • contains first-order sensitivity information J. McLellan

Parameter Estimation – Gauss-Newton Iteration Iterative procedure consisting of: • Linearization about the current estimate of the parameters • Solution of the linear(ized) regression problem to obtain the next parameter estimate update • Iteration until a convergence criterion is satisfied • for example, J. McLellan

Parameter Estimation - Nonlinear Regression Case approximating observation vector observations Tangent plane approximation J. McLellan

Quality of the Linear Approximation … depends on two components: • Degree to which the tangent plane provides a good approximation to the expectation surface- the planar assumption- related to intrinsic nonlinearity • Uniformity of the coordinates on the expectation surface – uniform coordinates- the linearization implies a uniform coordinate system on the tangent plane approximation – equal changes in a given parameter produce equal sized increments on the tangent plane- equal-sized increments in a given parameter may map to unequal-sized increments on the expectation surface J. McLellan

Parameter Estimation – Gauss-Newton Iteration Parameter estimation after jth iteration: Convergence • can be declared by looking at: • relative progress in the parameter estimate • relative progress in reducing the sum of squares function • combination of both progress in sum of squares reduction and progress in parameter estimates J. McLellan

Parameter Estimation – Gauss-Newton Iteration Convergence • the relative change criteria in sum of squares or parameter estimates terminate on lack of progress, rather than convergence (Bates and Watts, 1988) • alternative – due to Bates and Watts, termed the relative offset criterion • we will have converged to the true optimum (least squares estimates) if the residual vector is orthogonal to the nonlinear expectation surface, and in particular, its tangent plane approximation at the true parameter values • if we haven’t converged, the residual vectorwon’t necessarily be orthogonal to the tangent plane at the current parameter iterate J. McLellan

Computational Issues in Gauss-Newton Iteration The Gauss-Newton iteration can be subject to poor numerical conditioning, as the linearization is recomputed at new parameter iterates • Conditioning problems arise in inversion of VTV • Solution – use a decomposition technique • QR decomposition • Singular Value Decomposition (SVD) • Decomposition techniques will accommodate changes in rank of the Jacobian (sensitivity) matrix V J. McLellan

Other numerical estimation methods • Focus on minimizing the sum of squares function using optimization techniques • Newton-Raphson solution • Solve for increments using second-order approximation of sum of squares function • Levenberg-Marquardt compromise • Modification of the Gauss-Newton iteration, with introduction of factor to improve conditioning of linear regression step • Nelder-Mead • Pattern search method – doesn’t use derivative information • Hybrid approaches • Use combination of derivative-free and derivative-based methods J. McLellan

Other numerical estimation methods • In general, the least squares parameter estimation approach represents a minimization problem • Use optimization technique to find parameter estimates to minimize the sum of squares of the residuals J. McLellan

Inference – Joint Confidence Regions • Approximate confidence regions for parameters and predictions can be obtained by using a linearization approach • Approximate covariance matrix for parameter estimates:where denotes the Jacobian of the expectation mapping evaluated at the least squares parameter estimates • This covariance matrix is asymptotically the true covariance matrix for the parameter estimates as the number of data points becomes infinite • 100(1-α)% joint confidence region for the parameters: • compare to the linear regression case J. McLellan

Inference – Marginal Confidence Intervals • Marginal confidence intervals • Confidence intervals on individual parameterswhere is the approximate standard error of the parameter estimate – i-th diagonal element of the approximate parameter estimate covariance matrix, with noise variance estimated as in the linear case J. McLellan

Inference – Predictions & Confidence Intervals • Confidence intervals on predictions of existing points in the dataset • Reflect propagation of variability from the parameter estimates to the predictions • Expressions for nonlinear regression case based on linear approximation and direct extension of results for linear regression First, let’s review the linear regression case… J. McLellan

Precision of the Predicted Responses - Linear From the linear regression module (module 1) – The predicted response from an estimated model has uncertainty, because it is a function of the parameter estimates which have uncertainty: e.g., Solder Wave Defect Model - first response at the point -1,-1,-1 If the parameter estimates were uncorrelated, the variance of the predicted response would be: (recall results for variance of sum of random variables) J. McLellan

Precision of the Predicted Responses - Linear In general, both the variances and covariances of the parameter estimates must be taken into account. For prediction at the k-th data point: Note - J. McLellan

Precision of the Predicted Responses - Nonlinear Linearize the prediction equation about the least squares estimate: For prediction at the k-th data point: Note - J. McLellan

Estimating Precision of Predicted Responses Use an estimate of the inherent noise variance The degrees of freedom for the estimated variance of the predicted response are those of the estimate of the noise variance • replicates • external estimate • MSE linear nonlinear J. McLellan

Confidence Limits for Predicted Responses Linear and Nonlinear Cases: Follow an approach similar to that for parameters - 100(1-α)% confidence limits for predicted response at the k-th run are: • degrees of freedom are those of the inherent noise variance estimate If the prediction is for a response at conditions OTHER than one of the experimental runs, the limits are: J. McLellan

Precision of “Future” Predictions - Explanation Suppose we want to predict the response at conditions other than those of the experimental runs --> future run. The value we observe will consist of the component from the deterministic component, plus the noise component. In predicting this value, we must consider: • uncertainty from our prediction of the deterministic component • noise component The variance of this future prediction is where is computed using the same expression for variance of predicted responses at experimental run conditions - For linear case, with x containing specific run conditions, J. McLellan

Properties of LS Parameter Estimates Key Point - parameter estimates are random variables • because of how stochastic variation in data propagates through estimation calculations • parameter estimates have a variability pattern - probability distribution and density functions Unbiased • “average” of repeated data collection / estimation sequences will be true value of parameter vector J. McLellan

Properties of Parameter Estimates Linear Regression Case • Least squares estimates are – • Unbiased • Consistent • Efficient Nonlinear Regression Case • Least squares estimates are – • Asymptotically unbiased – as number of data points becomes infinite • Consistent • efficient J. McLellan

Diagnostics for nonlinear regression • Similar to linear case • Qualitative – residual plots • Residuals vs. • Factors in model • Sequence (observation) number • Factors not in model (covariates) • Predicted responses • Things to look for: • Trend remaining • Non-constant variance • Meandering in sequence number – serial correlation • Qualitative – plot of observed and predicted responses • Predicted vs. observed – slope of 1 • Predicted and observed – as function of independent variable(s) J. McLellan

Diagnostics for nonlinear regression • Quantitative diagnostics • Ratio tests: • MSR/MSE – as in the linear case – coarse measure of significant trend being modeled • Lack of fit test – if replicates are present • As in linear case – compute lack of fit sum of squares, error sum of squares, compare ratio • R-squared • coarse measure of significant trend • squared correlation of observed and predicted values • adjusted R-squared • squared correlation of observed and predicted values J. McLellan

Diagnostics for nonlinear regression • Quantitative diagnostics • Parameter confidence intervals: • Examine marginal intervals for parameters • Based on linear approximations • Can also use hypothesis tests • Consider dropping parameters that aren’t statistically significant • Issue in this case – parameters are more likely to be involved in more complex expression involving factors, parameters • E.g., Arrhenius reaction rate expression • If possible, examine joint confidence regions J. McLellan

Diagnostics for nonlinear regression • Quantitative diagnostics • Parameter estimate correlation matrix: • Examine correlation matrix for parameter estimates • Based on linear approximation • Compute covariance matrix, then normalize using pairs of standard deviations • Note significant correlations and keep these in mind when retaining/deleting parameters using marginal significance tests • Significant correlation between some parameter estimates may indicate over-parameterization relative to the data collected • Consider dropping some of the parameters whose estimates are highly correlated • Further discussion – Chapter 3 - Bates and Watts (1988), Chapter 5 - Seber and Wild (1988) J. McLellan

Practical Considerations • Convergence – • “tuning” of estimation algorithm – e.g., step size factors • Knowledge of the sum of squares (or likelihood or posterior density) surface – are there local minima? • Consider plotting surface • Reparameterization • Ensuring physically realistic parameter estimates • Common problem – parameters should be positive • Solutions • Constrained optimization approach to enforce non-negativity of parameters • Reparameterization – for example positive positive Bounded between 0 and 1 J. McLellan

Practical considerations • Correlation between parameter estimates • Reduce by reparameterization • Exponential example – J. McLellan

Practical considerations • Particular example – Arrhenius rate expression • Effectively reaction rate relative to reference temperature • Reduces correlation between parameter estimates and improves conditioning of estimation problem J. McLellan

Practical considerations • Scaling – of parameters and responses • Choices • Scale by nominal values • Nominal values – design centre point, typical value over range, average value • Scale by standard errors • Parameters – estimate of standard devn of parameter estimate • Responses – by standard devn of observations – noise standard deviation • Combinations – by nominal value / standard error • Scaling can improve conditioning of the estimation problem (e.g., scale sensitivity matrix V), and can facilitate comparison of terms on similar (dimensionless) bases J. McLellan

Practical considerations • Initial guesses • From prior knowledge • From prior results • By simplifying model equations • By exploiting conditionally linear parameters – fix these, estimate remaining parameters J. McLellan

Estimating parameters in differential equation models • Model is now described by a differential equation: • Referred to as “compartment models” in the biosciences. • Issues – • Estimation – what is the effective expectation function here? • Integral curve or flow (solution to differential equation) • Initial conditions – known?, unknown and estimated?, fixed (conditional estimation)? • Performing Gauss-Newton iteration • Or other numerical approach • Solving differential equation J. McLellan

Estimating parameters in differential equation models What is the effective expectation function here? • Differential equation model: • y – response, u – independent variables (factors), t – becomes a factor as well • Expectation function is the solution to the differential equation, which is evaluated at different times at which observations are taken • Note implicit dependence on initial conditions, which may be assumed or estimated • Often this is a conceptual model and not an analytical solution – the solution is often the numerical solution at specific times - subroutine J. McLellan

Estimating parameters in differential equation models • Expectation mapping • Random noise – is assumed to be additive on the observations J. McLellan

Estimating parameters in differential equation models Estimation approaches • Least squares (Gauss-Newton/Newton-Raphson iteration), maximum likelihood, Bayesian • Will require sensitivity information – sensitivity matrix V How can we get sensitivity information without having an explicit solution to the differential equation model? J. McLellan

Estimating parameters in differential equation models Sensitivity equations • We can interchange the order of differentiation in order to obtain the sensitivity differential equations – referred to as sensitivity equations • Note that the initial condition for the response may also be a function of the parameters – e.g., if we assume that the process is initially at steady-state  parametric dependence through steady-state form of model • These differential equations are solved to obtain the parameter sensitivities at the necessary time points: t1, …tn J. McLellan

Estimating parameters in differential equation models Sensitivity equations • The sensitivity equations are coupled with the original model differential equations – for the single differential equation (and response) case, we will have p+1 simultaneous differential equations, where p is the number of parameters J. McLellan

Estimating parameters in differential equation models Variations on single response differential equation models • Single response differential equation models need not be restricted to single differential equations • We really have a single measured output variable, and multiple factors • Control terminology – multi-input single-output (MISO) system Differential equation model Sensitivity equations J. McLellan

Interpreting sensitivity responses Example – first-order linear differential equation with step input Step response Sensitivities J. McLellan

Nonlinear Regression Parameter Estimation Methods & Iteration Convergence Criteria