- 42 Views
- Uploaded on
- Presentation posted in: General

4.4 Application--OLS Estimation. Background.

4.4 Application--OLS Estimation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

4.4 Application--OLS Estimation

- When we do a study of data and are looking at the relationship between 2 variables, and have reason to believe that a linear fit is appropriate, we need a way to determine a model that gives the optimal linear fit (one that best reflects the trend in the data).
- Ex. Relationship between hits and RBI’s
- A perfect fit would result if every point in the data exactly satisfied some equation y = a + bx , but this is next to impossible -- too much variability in real world data.

- Assume y = a + bx is the best fit for the data.
- Then we can find a point on the line, (xi, f(xi)), with the same x-value as each of the points in the data set, (xi,yi)
- Draw a diagram on the board.
- Then we say di = yi - f(xi) = distance from point in the data to the point on the line.
- di is also called the error or residual at xi -- how far data is from line
- So to measure the fit of the line, we could add up all the errors, d1 + d2 + … + dn
- However, note that a best fit line will have some data above and some below, so this error will turn out to be 0.

- Therefore, we need to make all of our errors positive by taking either |di| or di2.
- |di| will give the same weight to large and small errors, where di2 gives more weight to larger errors
- Ex d’s: {0,0,0,0,50} vs {10,10,10,10,10}
avg of |di| = 10 = 10

- Which should be considered a better fit?
- graph that goes right through 4 points, but nowhere near #5
- graph that is same distance from each point (yes)

- |di| method will not show this, but di2 method will.

- So, we will select the model which minimizes the sum of squared residuals:
- S = d12 + d22 + … + dn2 = [y1 - f(x1)]2 + …+ [yn - f(xn)]2
- This line is called the least squares approximating line
- We can use vectors to help us choose y = a + bx to minimize S

- S, which we will minimize, is just the sum of the squares of the entries in the matrix, Y-MZ.
- If n = 3, then

Y-MZ is a vector =

Then S = || Y-MZ||2

S = || Y-MZ||2

Recall

Y and M are given since we have 3 data points to fit.

We simply need to select Z to minimize S.

Let P be the set of all vectors MZ where Z varies:

It turns out that all of the vectors in set P lie in the same plane through the origin (we discuss why later in the book).

The equation of the plane is

Take a=0,b=1, or a,b=0 and find that this plane contains:

And the normal vector will be U x V =

Y

YY-MA

O

MZMA

Recall that we are trying to minimize S = || Y-MZ||2

Y = (y1,y2,y3) is a point in space, and MZ is some vector in the set P which we have illustrated as a plane.

S = || Y-MZ||2 is the squared distance from the point to the plane, so if we can find the point,MA, in the plane closest to Y, we will have our solution.

Y

YY-MA

O

MZMA

Y-MA is orthogonal to all vectors,MZ, in the plane, so

(MZ) • (Y-MA) = 0

Note this rule for dot products when vectors are written as matrices:

Y

YY-MA

O

MZMA

0 = (MZ) • (Y-MA) =(MZ)T(Y-MA)=ZTMT(Y-MA)

=ZT(MTY-MTMA) = Z • (MTY-MTMA)

The last dot product is in two dimensions and tells that (MTY-MTMA) is orthogonal to every possible Z which can only happen if (MTY-MTMA) = 0,so

MTY=MTMA called the normal equations for A

Y

YY-MA

O

MZMA

With x1, x2,x3 all distinct, we can show that MTM is invertible, so from MTY=MTMA ,we get A = (MTM)-1MTY,

This will give us A=(a,b) which will give then give us the point (a+bx1,a+bx2,a+bx3) closest to Y.

Thus the best fit line will then be y=a + bx.

Y

YY-MA

O

MZMA

Recall that this argument started by defining n=3 so that we could use a 3 dimensional argument with vectors. The argument becomes more complex, but does extend to any n.

- Suppose that n data points (x1,y1),…,(xn,yn) of which at least two x’s are distinct. If

Then, the least squares approximating line has equation y=a0 + a1x where A = is found by Gaussian

elimination from the normal equations MTY=MTMA

Since at least two x’s are distinct, MTM is invertible so A=(MTM)-1MTY

- Find the least squares approximating line for the following data: (1,0),(2,2),(4,5),(6,9),(8,12)
- See what you get with the TI83+

- Find an equation of the plane through P(1,3,2) with normal (2,0,-1).

We can generalize to select the least squares approximating polynmial of degree m: f(x)=a0+a1x+a2x2+…+anxn where we estimate the a’s

If n data points are given with at least m+1 x’s distinct, then

Then least squares approximating polynomial of degree m is: f(x)=a0+a1x+a2x2+…+anxn where

Is found by Gaussian elim from normal equations MTY=MTMA

Since at least m+1 x’s are distinct, MTM is invertible so

A=(MTM)-1MTY

- we need at least one more data point than the degree of the polynomial we are trying to estimate.
- I.e. With n data points, we could not estimate a polynomial of degree n.

- Find the least squares approximating quadratic for the following data points: (-2,0),(0,-4),(2,-10),(4,-9),(6,-3)