1 / 104

CHEE801 Module 5:

CHEE801 Module 5:. Nonlinear Regression. Outline -. Single response Notation Assumptions Least Squares Estimation – Gauss-Newton Iteration, convergence criteria, numerical optimization Diagnostics Properties of Estimators and Inference

kaethe
Download Presentation

CHEE801 Module 5:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHEE801 Module 5: Nonlinear Regression J. McLellan

  2. Outline - Single response • Notation • Assumptions • Least Squares Estimation – Gauss-Newton Iteration, convergence criteria, numerical optimization • Diagnostics • Properties of Estimators and Inference • Other estimation formulations – maximum likelihood and Bayesian estimators • Dealing with differential equation models • And then on to multi-response… J. McLellan

  3. Notation random noise component Model: Model specification – • the model equation is • with n experimental runs, we have • defines the expectation surface • the nonlinear regression model is • Model specification involves form of equationand parameterization explanatory variables – ith run conditions p-dimensional vector of parameters J. McLellan

  4. Example #1 (Bates and Watts, 1988) Rumford data – • Cooling experiment – grind cannon barrel with blunt bore, and then monitor temperature while it cools • Newton’s law of cooling – differential equation with exponential solution • Independent variable is t (time) • Ambient T was 60 F • Model equation • 1st-order dynamic decay J. McLellan

  5. Parameter Estimation – Linear Regression Case approximating observation vector residual vector observations Expectation surface J. McLellan

  6. Parameter Estimation - Nonlinear Regression Case approximating observation vector residual vector observations expectation surface J. McLellan

  7. Rumford Example • Consider two observations – 2-dimensional observation space • At t=4, t=41 min J. McLellan

  8. Parameter Estimation – Gauss-Newton Iteration Least squares estimation – minimize Iterative procedure consisting of: • Linearization about the current estimate of the parameters • Solution of the linear(ized) regression problem to obtain the next parameter estimate • Iteration until a convergence criterion is satisfied J. McLellan

  9. Linearization about a nominal parameter vector Linearize the expectation function η(θ) in terms of the parameter vectorθ about a nominal vector θ0: • Sensitivity Matrix • Jacobian of the expectationfunction • contains first-order sensitivity information J. McLellan

  10. Parameter Estimation – Gauss-Newton Iteration Iterative procedure consisting of: • Linearization about the current estimate of the parameters • Solution of the linear(ized) regression problem to obtain the next parameter estimate update • Iteration until a convergence criterion is satisfied • for example, J. McLellan

  11. Parameter Estimation - Nonlinear Regression Case approximating observation vector observations Tangent plane approximation J. McLellan

  12. Quality of the Linear Approximation … depends on two components: • Degree to which the tangent plane provides a good approximation to the expectation surface- the planar assumption- related to intrinsic nonlinearity • Uniformity of the coordinates on the expectation surface – uniform coordinates- the linearization implies a uniform coordinate system on the tangent plane approximation – equal changes in a given parameter produce equal sized increments on the tangent plane- equal-sized increments in a given parameter may map to unequal-sized increments on the expectation surface J. McLellan

  13. Rumford Example • Consider two observations – 2-dimensional observation space • At t=4, t=41 min θ = 0 θ changed in increments of 0.025 Non-uniformity in coordinates Tangent plane approximation θ = 10 J. McLellan

  14. Rumford example • Model function • Dataset consists of 13 observations • Exercise – sensitivity matrix? • Dimensions? J. McLellan

  15. Rumford example – tangent approximation • At θ = 0.05, Note uniformity in coordinateson tangent plane Non-uniformity in coordinates Tangent plane approximation J. McLellan

  16. Rumford example – tangent approximation • At θ = 0.7, J. McLellan

  17. Parameter Estimation – Gauss-Newton Iteration Parameter estimation after jth iteration: Convergence • can be declared by looking at: • relative progress in the parameter estimate • relative progress in reducing the sum of squares function • combination of both progress in sum of squares reduction and progress in parameter estimates J. McLellan

  18. Parameter Estimation – Gauss-Newton Iteration Convergence • the relative change criteria in sum of squares or parameter estimates terminate on lack of progress, rather than convergence (Bates and Watts, 1988) • alternative – due to Bates and Watts, termed the relative offset criterion • we will have converged to the true optimum (least squares estimates) if the residual vector is orthogonal to the nonlinear expectation surface, and in particular, its tangent plane approximation at the true parameter values • if we haven’t converged, the residual vectorwon’t necessarily be orthogonal to the tangent plane at the current parameter iterate J. McLellan

  19. Parameter Estimation – Gauss-Newton Iteration Convergence • declare convergence by comparing component of residual vector lying on tangent plane to the component orthogonal to the tangent plane – if the component on the tangent plane is small, then we are close to orthogonality  convergence • Note also that after each iteration, the residual vector is orthogonal to the tangent plane computed at the previous parameter iterate (where the linearization is conducted), and not necessarily to the tangent plane and expectation surface at the most recently computed parameter estimate J. McLellan

  20. Computational Issues in Gauss-Newton Iteration The Gauss-Newton iteration can be subject to poor numerical conditioning, as the linearization is recomputed at new parameter iterates • Conditioning problems arise in inversion of VTV • Solution – use a decomposition technique • QR decomposition • Singular Value Decomposition (SVD) • Decomposition techniques will accommodate changes in rank of the Jacobian (sensitivity) matrix V J. McLellan

  21. Other numerical estimation methods • Focus on minimizing the sum of squares function using optimization techniques • Newton-Raphson solution • Solve for increments using second-order approximation of sum of squares function • Levenberg-Marquardt compromise • Modification of the Gauss-Newton iteration, with introduction of factor to improve conditioning of linear regression step • Nelder-Mead • Pattern search method – doesn’t use derivative information • Hybrid approaches • Use combination of derivative-free and derivative-based methods J. McLellan

  22. Other numerical estimation methods • In general, the least squares parameter estimation approach represents a minimization problem • Use optimization technique to find parameter estimates to minimize the sum of squares of the residuals J. McLellan

  23. Newton-Raphson approach • Start with the residual sum of squares function S(θ) and form the 2nd-order Taylor series expansion:where H is the Hessian of S(θ): • the Hessian is the multivariable second-derivative for a function of a vector • Now solve for the next move by applying the stationarity condition (take 1st derivative, set to zero) J. McLellan

  24. Hessian • Is the matrix of second derivatives – (consider using Maple to generate!) J. McLellan

  25. Jacobian and Hessian of S(θ) • Can be found by the chain rule: the sensitivity matrix that we had before: V Often used as anapproximation of the Hessian – “expectedvalue of the Hessian” 3-dimensional array(tensor) J. McLellan

  26. Newton-Raphson approach • Using the approximate Hessian (which is always positive semi-definite), the change in parameter estimate is:where V is evaluated at θ(i) is the sensitivity matrix. • This is the Gauss-Newton iteration! • Issues – computing and updating the Hessian matrix • Potential better progress – information about curvature • Hessian can cease to be positive definite (required in order for stationary point to be a minimum) J. McLellan

  27. Levenberg-Marquardt approach • Improve the conditioning of the inverse by adding a factor – biased regression solution – • Levenberg modificationwhere Ip is the pxp identity matrix • Marquardt modificationwhere D is a matrix containing the diagonal entries of VTV • If λ -> 0, approach Gauss-Newton iteration • If λ -> ∞, approach direction of steepest ascent – optimization technique J. McLellan

  28. Inference – Joint Confidence Regions • Approximate confidence regions for parameters and predictions can be obtained by using a linearization approach • Approximate covariance matrix for parameter estimates:where denotes the Jacobian of the expectation mapping evaluated at the least squares parameter estimates • This covariance matrix is asymptotically the true covariance matrix for the parameter estimates as the number of data points becomes infinite • 100(1-α)% joint confidence region for the parameters: • compare to the linear regression case J. McLellan

  29. Inference – Marginal Confidence Intervals • Marginal confidence intervals • Confidence intervals on individual parameterswhere is the approximate standard error of the parameter estimate – i-th diagonal element of the approximate parameter estimate covariance matrix, with noise variance estimated as in the linear case J. McLellan

  30. Inference – Predictions & Confidence Intervals • Confidence intervals on predictions of existing points in the dataset • Reflect propagation of variability from the parameter estimates to the predictions • Expressions for nonlinear regression case based on linear approximation and direct extension of results for linear regression First, let’s review the linear regression case… J. McLellan

  31. Precision of the Predicted Responses - Linear From the linear regression module (module 1) – The predicted response from an estimated model has uncertainty, because it is a function of the parameter estimates which have uncertainty: e.g., Solder Wave Defect Model - first response at the point -1,-1,-1 If the parameter estimates were uncorrelated, the variance of the predicted response would be: (recall results for variance of sum of random variables) J. McLellan

  32. Precision of the Predicted Responses - Linear In general, both the variances and covariances of the parameter estimates must be taken into account. For prediction at the k-th data point: Note - J. McLellan

  33. Precision of the Predicted Responses - Nonlinear Linearize the prediction equation about the least squares estimate: For prediction at the k-th data point: Note - J. McLellan

  34. Estimating Precision of Predicted Responses Use an estimate of the inherent noise variance The degrees of freedom for the estimated variance of the predicted response are those of the estimate of the noise variance • replicates • external estimate • MSE linear nonlinear J. McLellan

  35. Confidence Limits for Predicted Responses Linear and Nonlinear Cases: Follow an approach similar to that for parameters - 100(1-α)% confidence limits for predicted response at the k-th run are: • degrees of freedom are those of the inherent noise variance estimate If the prediction is for a response at conditions OTHER than one of the experimental runs, the limits are: J. McLellan

  36. Precision of “Future” Predictions - Explanation Suppose we want to predict the response at conditions other than those of the experimental runs --> future run. The value we observe will consist of the component from the deterministic component, plus the noise component. In predicting this value, we must consider: • uncertainty from our prediction of the deterministic component • noise component The variance of this future prediction is where is computed using the same expression for variance of predicted responses at experimental run conditions - For linear case, with x containing specific run conditions, J. McLellan

  37. Properties of LS Parameter Estimates Key Point - parameter estimates are random variables • because of how stochastic variation in data propagates through estimation calculations • parameter estimates have a variability pattern - probability distribution and density functions Unbiased • “average” of repeated data collection / estimation sequences will be true value of parameter vector J. McLellan

  38. Properties of Parameter Estimates Consistent • behaviour as number of data points tends to infinity • with probability 1, • distribution narrows as N becomes large Efficient • variance of least squares estimates is less than that of other types of parameter estimates J. McLellan

  39. Properties of Parameter Estimates Linear Regression Case • Least squares estimates are – • Unbiased • Consistent • Efficient Nonlinear Regression Case • Least squares estimates are – • Asymptotically unbiased – as number of data points becomes infinite • Consistent • efficient J. McLellan

  40. Diagnostics for nonlinear regression • Similar to linear case • Qualitative – residual plots • Residuals vs. • Factors in model • Sequence (observation) number • Factors not in model (covariates) • Predicted responses • Things to look for: • Trend remaining • Non-constant variance • Meandering in sequence number – serial correlation • Qualitative – plot of observed and predicted responses • Predicted vs. observed – slope of 1 • Predicted and observed – as function of independent variable(s) J. McLellan

  41. Diagnostics for nonlinear regression • Quantitative diagnostics • Ratio tests: • MSR/MSE – as in the linear case – coarse measure of significant trend being modeled • Lack of fit test – if replicates are present • As in linear case – compute lack of fit sum of squares, error sum of squares, compare ratio • R-squared • coarse measure of significant trend • squared correlation of observed and predicted values • adjusted R-squared • squared correlation of observed and predicted values J. McLellan

  42. Diagnostics for nonlinear regression • Quantitative diagnostics • Parameter confidence intervals: • Examine marginal intervals for parameters • Based on linear approximations • Can also use hypothesis tests • Consider dropping parameters that aren’t statistically significant • Issue in this case – parameters are more likely to be involved in more complex expression involving factors, parameters • E.g., Arrhenius reaction rate expression • If possible, examine joint confidence regions, likelihood regions, HPD regions • Can also test to see if a set of parameter values lie in a particular region squared correlation of observed and predicted values J. McLellan

  43. Diagnostics for nonlinear regression • Quantitative diagnostics • Parameter estimate correlation matrix: • Examine correlation matrix for parameter estimates • Based on linear approximation • Compute covariance matrix, then normalize using pairs of standard deviations • Note significant correlations and keep these in mind when retaining/deleting parameters using marginal significance tests • Significant correlation between some parameter estimates may indicate over-parameterization relative to the data collected • Consider dropping some of the parameters whose estimates are highly correlated • Further discussion – Chapter 3 - Bates and Watts (1988), Chapter 5 - Seber and Wild (1988) J. McLellan

  44. Practical Considerations • Convergence – • “tuning” of estimation algorithm – e.g., step size factors • Knowledge of the sum of squares (or likelihood or posterior density) surface – are there local minima? • Consider plotting surface • Reparameterization • Ensuring physically realistic parameter estimates • Common problem – parameters should be positive • Solutions • Constrained optimization approach to enforce non-negativity of parameters • Reparameterization – for example positive positive Bounded between 0 and 1 J. McLellan

  45. Practical considerations • Correlation between parameter estimates • Reduce by reparameterization • Exponential example – J. McLellan

  46. Practical considerations • Particular example – Arrhenius rate expression • Effectively reaction rate relative to reference temperature • Reduces correlation between parameter estimates and improves conditioning of estimation problem J. McLellan

  47. Practical considerations • Scaling – of parameters and responses • Choices • Scale by nominal values • Nominal values – design centre point, typical value over range, average value • Scale by standard errors • Parameters – estimate of standard devn of parameter estimate • Responses – by standard devn of observations – noise standard deviation • Combinations – by nominal value / standard error • Scaling can improve conditioning of the estimation problem (e.g., scale sensitivity matrix V), and can facilitate comparison of terms on similar (dimensionless) bases J. McLellan

  48. Practical considerations • Initial guesses • From prior knowledge • From prior results • By simplifying model equations • By exploiting conditionally linear parameters – fix these, estimate remaining parameters J. McLellan

  49. Dealing with heteroscedasticity • Problem it poses – precision of parameter estimates • Weighted least squares estimation • Variance stabilizing transformations – e.g., Box-Cox transformations J. McLellan

  50. Estimating parameters in differential equation models • Model is now described by a differential equation: • Referred to as “compartment models” in the biosciences. • Issues – • Estimation – what is the effective expectation function here? • Integral curve or flow (solution to differential equation) • Initial conditions – known?, unknown and estimated?, fixed (conditional estimation)? • Performing Gauss-Newton iteration • Or other numerical approach • Solving differential equation J. McLellan

More Related