1 / 19

Sampling plans for linear regression

Sampling plans for linear regression. Given a domain, we can reduce the prediction error by good choice of the sampling points. The choice of sampling locations is called “design of experiments” or DOE.

veta
Download Presentation

Sampling plans for linear regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling plans for linear regression • Given a domain, we can reduce the prediction error by good choice of the sampling points. • The choice of sampling locations is called “design of experiments” or DOE. • In this lecture we will consider DOEs for linear regression using linear and quadratic polynomials and where errors are due to noise in the data. • With a given number of points the best DOE is one that will reduce the prediction variance (reviewed in next few slides). • The simplest DOE is full factorial design where we sample each variable (factor) at a fixed number of values (levels) • Example: with four factors and three levels each we will sample 81 points • Full factorial design is not practical except for low dimensions

  2. Linear Regression • Surrogate is linear combination of given shape functions • For linear approximation • Difference (error) between data and surrogate • Minimize square error • Differentiate to obtain

  3. Model based error for linear regression • The common assumptions for linear regression • The true function is described by the functional form of the surrogate. • The data is contaminated with normally distributed error with the same standard deviation at every point. • The errors at different points are not correlated. • Under these assumptions, the noise standard deviation (called standard error) is estimated as • is used as estimate of the prediction error.

  4. Prediction variance • Linear regression model • Define then • With some algebra • Standard error

  5. Prediction variance for full factorial design • Recall that standard error (square root of prediction variance is • For full factorial design the domain is normally a box. • Cheapest full factorial design: two levels (not good for quadratic polynomials). • For a linear polynomial standard error is then • Maximum error at vertices • What does the ratio in the square root represent?

  6. Designs for linear polynomials • Traditionally use only two levels. • Orthogonal design when XTX is diagonal. • Full factorial design is orthogonal, not so easy to produce other orthogonal designs with less points. • It is beneficial to place the points at the edges of the design domain. • Stability: Small variation of prediction variance in domain is also desirable property.

  7. Example • Compare prediction variance for an orthogonal design based on equilateral triangle to right triangle (both are saturated) • Linear polynomial y=b1+b2x1+b3x2 • For right triangle obtain

  8. Comparison • Prediction variances for equilateral triangle • The maximum variance at (1,1) is three times larger than the lowest one. • For right triangle Maximum variance (3) is six times the lowest, and triple that of the equilateral triangle. • A fairer comparison is when we restrict triangle to lie inside box • The prediction variance is /3 • Maximum prediction variance (1.5) and stability (ratio of 4.5) are still better than for the right triangle, but by less.

  9. Quadratic Polynomial • A quadratic polynomial has (n+1)(n+2)/2 coefficients, so we need at least that many points. • Need at least three different values of each variable. • Simplest DOE is three-level, full factorial design • Impractical for n>5 • Also unreasonable ratio between number of points and number of coefficients • For example, for n=8 we get 6561 samples for 45 coefficients. • My rule of thumb is that you want twice as many points as coefficients

  10. Central Composite Design • Includes 2n vertices, 2n face points plus ncrepetitions of central point • Can choose α so to • achieve spherical design • achieve rotatibility (prediction variance is spherical) • Stay in box (face centered) FCCCD • Still impractical for n>8

  11. Repeated observations at origin • Unlike linear designs, prediction variance is high at origin. • Repetition at origin decreases variance there and improves stability. • What other rationale for choosing the origin for repetition? • Repetition also gives an independent measure of magnitude of noise. • Can be used also for lack-of-fit tests.

  12. Without repetition (9 points) • Contours of prediction variance for spherical CCD design. • How come it is rotatable?

  13. Center repeated 5 times (13 points) . • With five repetitions we reduce the maximum prediction variance and greatly improve the uniformity. • Five points is the optimum for uniformity.

  14. Variance optimal designs • Full factorial and CCD are not flexible in number of points • Standard error • A key to most optimal DOE methods is moment matrix • A good design of experiments will maximize the terms in this matrix, especially the diagonal elements. • D-optimal designs maximize determinant of moment matrix. • Determinant is inversely proportional to square of volume of confidence region on coefficients.

  15. Example • Given the model y=b1x1+b2x2, and the two data points (0,0) and (1,0), find the optimum third data point (p,q) in the unit square. • We have • So that the third point is (p,1), for any value of p • Finding D-optimal design in higher dimensions is a difficult optimization problem often solved heuristically

  16. Matlab example >> ny=6;nbeta=6; >> [dce,x]=cordexch(2,ny,'quadratic'); >> dce' 1 1 -1 -1 0 1 -1 1 1 -1 -1 0 scatter(dce(:,1),dce(:,2),200,'filled') >> det(x'*x)/ny^nbeta ans = 0.0055 With 12 points: >> ny=12; >> [dce,x]=cordexch(2,ny,'quadratic'); >> dce' -1 1 -1 0 1 0 1 -1 1 0 -1 1 1 -1 -1 -1 1 1 -1 -1 0 0 0 1 scatter(dce(:,1),dce(:,2),200,'filled') >> det(x'*x)/ny^nbeta ans =0.0102

  17. Other criteria • A-optimal minimizes trace of the inverse of the moment matrix. • This minimizes the sum of the variances of the coefficients. • G-optimality minimizes the maximum of the prediction variance.

  18. Example • For the previous example, find the A-optimal design • Minimum at (0,1), so this point is both A-optimal and D-optimal.

  19. Problems • Create a 13-point D-optimal design in two dimensional space and compare its prediction variance to that of the CCD design shown on Slide 13. • Generate noisy data for the function y=(x+y)2 and fit using the two designs and compare the accuracy of the coefficients.

More Related