4.4 Application--OLS Estimation

1 / 21

# 4.4 Application--OLS Estimation - PowerPoint PPT Presentation

4.4 Application--OLS Estimation. Background.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' 4.4 Application--OLS Estimation' - ezra-pace

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### 4.4 Application--OLS Estimation

Background
• When we do a study of data and are looking at the relationship between 2 variables, and have reason to believe that a linear fit is appropriate, we need a way to determine a model that gives the optimal linear fit (one that best reflects the trend in the data).
• Ex. Relationship between hits and RBI’s
• A perfect fit would result if every point in the data exactly satisfied some equation y = a + bx , but this is next to impossible -- too much variability in real world data.
So what do we do?
• Assume y = a + bx is the best fit for the data.
• Then we can find a point on the line, (xi, f(xi)), with the same x-value as each of the points in the data set, (xi,yi)
• Draw a diagram on the board.
• Then we say di = yi - f(xi) = distance from point in the data to the point on the line.
• di is also called the error or residual at xi -- how far data is from line
• So to measure the fit of the line, we could add up all the errors, d1 + d2 + … + dn
• However, note that a best fit line will have some data above and some below, so this error will turn out to be 0.
So what do we do?
• Therefore, we need to make all of our errors positive by taking either |di| or di2.
• |di| will give the same weight to large and small errors, where di2 gives more weight to larger errors
• Ex d’s: {0,0,0,0,50} vs {10,10,10,10,10}

avg of |di| = 10 = 10

Which one is a better fit?
• Which should be considered a better fit?
• graph that goes right through 4 points, but nowhere near #5
• graph that is same distance from each point (yes)
• |di| method will not show this, but di2 method will.
Ordinary Least Squares Method
• So, we will select the model which minimizes the sum of squared residuals:
• S = d12 + d22 + … + dn2 = [y1 - f(x1)]2 + …+ [yn - f(xn)]2
• This line is called the least squares approximating line
• We can use vectors to help us choose y = a + bx to minimize S
Ordinary Least Squares Method
• S, which we will minimize, is just the sum of the squares of the entries in the matrix, Y-MZ.
• If n = 3, then

Y-MZ is a vector =

Then S = || Y-MZ||2

Ordinary Least Squares Method

S = || Y-MZ||2

Recall

Y and M are given since we have 3 data points to fit.

We simply need to select Z to minimize S.

Let P be the set of all vectors MZ where Z varies:

Ordinary Least Squares Method

It turns out that all of the vectors in set P lie in the same plane through the origin (we discuss why later in the book).

The equation of the plane is

Take a=0,b=1, or a,b=0 and find that this plane contains:

And the normal vector will be U x V =

Ordinary Least Squares Method

Y

YY-MA

O

MZMA

Recall that we are trying to minimize S = || Y-MZ||2

Y = (y1,y2,y3) is a point in space, and MZ is some vector in the set P which we have illustrated as a plane.

S = || Y-MZ||2 is the squared distance from the point to the plane, so if we can find the point,MA, in the plane closest to Y, we will have our solution.

Ordinary Least Squares Method

Y

YY-MA

O

MZMA

Y-MA is orthogonal to all vectors,MZ, in the plane, so

(MZ) • (Y-MA) = 0

Note this rule for dot products when vectors are written as matrices:

Ordinary Least Squares Method

Y

YY-MA

O

MZMA

0 = (MZ) • (Y-MA) =(MZ)T(Y-MA)=ZTMT(Y-MA)

=ZT(MTY-MTMA) = Z • (MTY-MTMA)

The last dot product is in two dimensions and tells that (MTY-MTMA) is orthogonal to every possible Z which can only happen if (MTY-MTMA) = 0,so

MTY=MTMA called the normal equations for A

Ordinary Least Squares Method

Y

YY-MA

O

MZMA

With x1, x2,x3 all distinct, we can show that MTM is invertible, so from MTY=MTMA ,we get A = (MTM)-1MTY,

This will give us A=(a,b) which will give then give us the point (a+bx1,a+bx2,a+bx3) closest to Y.

Thus the best fit line will then be y=a + bx.

Ordinary Least Squares Method

Y

YY-MA

O

MZMA

Recall that this argument started by defining n=3 so that we could use a 3 dimensional argument with vectors. The argument becomes more complex, but does extend to any n.

Theorem 1
• Suppose that n data points (x1,y1),…,(xn,yn) of which at least two x’s are distinct. If

Then, the least squares approximating line has equation y=a0 + a1x where A = is found by Gaussian

elimination from the normal equations MTY=MTMA

Since at least two x’s are distinct, MTM is invertible so A=(MTM)-1MTY

Example
• Find the least squares approximating line for the following data: (1,0),(2,2),(4,5),(6,9),(8,12)
• See what you get with the TI83+
Example
• Find an equation of the plane through P(1,3,2) with normal (2,0,-1).
We extend further...

We can generalize to select the least squares approximating polynmial of degree m: f(x)=a0+a1x+a2x2+…+anxn where we estimate the a’s

Theorem 2 (proof in ch 6)

If n data points are given with at least m+1 x’s distinct, then

Then least squares approximating polynomial of degree m is: f(x)=a0+a1x+a2x2+…+anxn where

Is found by Gaussian elim from normal equations MTY=MTMA

Since at least m+1 x’s are distinct, MTM is invertible so

A=(MTM)-1MTY

Note
• we need at least one more data point than the degree of the polynomial we are trying to estimate.
• I.e. With n data points, we could not estimate a polynomial of degree n.
Example
• Find the least squares approximating quadratic for the following data points: (-2,0),(0,-4),(2,-10),(4,-9),(6,-3)