1 / 16

Global predictors of regression fidelity

Global predictors of regression fidelity. A single number to characterize the overall quality of the surrogate. Equivalence measures Coefficient of multiple determination Adjusted coefficient of multiple determination Prediction accuracy measures Model independent: Cross validation error

Download Presentation

Global predictors of regression fidelity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Global predictors of regression fidelity • A single number to characterize the overall quality of the surrogate. • Equivalence measures • Coefficient of multiple determination • Adjusted coefficient of multiple determination • Prediction accuracy measures • Model independent: Cross validation error • Model dependent: Standard error

  2. Linear Regression • Surrogate is linear combination of given shape functions • For linear approximation • Difference (error) between data and surrogate • Minimize square error • Differentiate to obtain

  3. Coefficient of multiple determination • Equivalence of surrogate with data is often measured by how much of the variance in the data is captured by the surrogate. • Coefficient of multiple determination and adjusted version

  4. R2 does not reflect accuracy • Compare y1=x to y2=0.1x plus same noise (normally distributed with zero mean and standard deviation of 1. • Estimate the average errors between the function (red) and surrogate (blue). R2=0.9785 R2=0.3016

  5. Cross validation • Validation consists of checking the surrogate at a set of validation points. • This may be considered wasteful because we do not use all the points for fitting the best possible surrogate. • Cross validation divides data into nggroups. • Fit the approximation to ng -1 groups, and use last group to estimate error. Repeat for each group. • When each group consists of one point, error often called PRESS (prediction error sum of squares) • Calculate error at each point and then present r.m.s error • For linear regression can be shown that

  6. Model based error for linear regression • The common assumptions for linear regression • The true function is described by the functional form of the surrogate. • The data is contaminated with normally distributed error with the same standard deviation at every point. • The errors at different points are not correlated. • Under these assumptions, the noise standard deviation (called standard error) is estimated as • is used as estimate of the prediction error.

  7. Comparison of errors • For the example in slide 4 of y=x plus the Gaussian noise the fit was=0.5981+0.9970x. • The noise came from randn, set to zero mean and unit standard deviation. However it had a mean of 0.552 and a standard deviation of 1.3. • The normal error is calculated as 1.32 and the cross validation (PRESS) error as 1.37. • With less data, the differences will be larger. • The actual error was only about 0.6 because the large amount of data filtered the noise.

  8. Problems regression accuracy • The pairs (0,0), (1,1), (2,1) represent strain (millistrains) and stress (ksi) measurements. • Estimate Young’s modulus using regression. • Calculate the error in Young modulus using cross validation both from the definition and from the formula on Slide 5. • Repeat the example of y=x, using only data at x=3,6,9,…,30. Use the same noise values as given for these points in the notes for Slide 4.

  9. Prediction variance in Linear Regression • Assumptions on noise in linear regression allow us to estimate the prediction variance due to the noise at any point. • Prediction variance is usually large when you are far from a data point. • We distinguish between interpolation, when we are in the convex hull of the data points, and extrapolation where we are outside. • Extrapolation is associated with larger errors, and in high dimensions it usually cannot be avoided.

  10. Prediction variance • Linear regression model • Define then • With some algebra • Standard error

  11. Example of prediction variance • For a linear polynomial RS y=b1+b2x1+b3x2find the prediction variance in the region • (a) For data at three vertices (omitting (1,1))

  12. Interpolation vs. Extrapolation • At origin . At 3 vertices . At (1,1)

  13. Standard error contours • Minimum error obtained by setting to zero derivative of prediction variance with respect to . • What is special about this point • Contours of prediction variance provide more detail.

  14. Data at four vertices • Now • And • Error at vertices • At the origin minimum is • How can we reduce error without adding points?

  15. Graphical Comparison of Standard Errors Three points Four points

  16. Problems prediction variance • Redo the four point example, when the data points are not at the corners but inside the domain, at +-0.8. What does the difference in the results tells you? • For a grid of 3x3 data points, compare the standard errors for a linear and quadratic fits.

More Related