- 37 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Regression' - allistair-guthrie

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Some problems you might want to look at

- Given the annual number of cancers of a certain type, over a few decades, make a prediction for the future, with uncertainty.
- There seems to be a connection between efficiency and size for Norwegian hospitals. Given data from many hospitals, determine if there is a connection, and what it is.
- Investigate the connection between efficiency and a number of possible explanatory variables.

Connection between variables

We would like to study connection between x and y!

Connection between variables

Fit a line!

What can you do with a fitted line?

- Interpolation
- Extrapolation (sometimes dangerous!)
- Interpret the parameters of the line

How to define the line that ”fits best”?

The sum of the squares of

the ”errors” minimized

=

Least squares method!

- Note: many other ways to fit the line can be imagined

How to compute the line fit with the least squares method?

- Let (x1, y1), (x2, y2),...,(xn, yn) denote the points in the plane.
- Find a and b so that y=a+bx fit the points by minimizing
- Solution:

where and all sums are done for i=1,...,n.

How do you get this answer?

- Differentiate S with respect to a og b, and set the result to 0

We get:

This is two equations with two unknowns, and the solution of these give the answer.

Example

Some grasshoppers make sound by rubbing their wings against each other. There is a connection between the temperature and the frequency of the movements, unique for each species. Here are some data for Nemobius fasciatus fasciatus:

If you measure 18 movements per sec, what is estim. temperature?

Data from Pierce, GW. The Songs of Insects. Cambridge, Mass.: Harvard University Press, 1949, pp. 12-21

y against x ≠ x against y

- Linear regression of y against x does not give the same result as the opposite.

Regression of

y against x

Regression of x against y

Centered variables

- Assume we subtract the average from both x- and y-values
- We get and
- We get and
- From definitions of correlation and standard deviation se get

(even in uncentered case)

- Note also: The residuals sum to 0.

Anaylzing the variance

- Define
- SSE: Error sum of squares
- SSR: Regression sum of squares
- SST: Total sum of squares
- We can show that

SST = SSR + SSE

- Define
- R2 is the ”coefficient of determination”

But how to answer questions like:

- Given that a positive slope (b) has been estimated: Does it give a reproducible indication that there is a positive trend, or is it a result of random variation?
- What is a confidence interval for the estimated slope?
- What is the prediction, with uncertainty, at a new x value?

The standard simple regression model

- We have to do as before, and define a model

where are independent, normally distributed, with equal variance

- We can then use data to estimate the model parameters, and to make statements about their uncertainty

Confidence intervals for simple regression

- In a simple regression model,
- a estimates
- b estimates
- estimates
- Also,

where estimates variance of b

- So a confidence interval for is given by

Hypothesis testing for simple regression

- Choose hypotheses:
- Test statistic:
- Reject H0 if or

Prediction from a simple regression model

- A regression model can be used to predict the response at a new value xn+1
- The uncertainty in this prediction comes from two sources:
- The uncertainty in the regression line
- The uncertainty of any response, given the regression line
- A confidence interval for the prediction:

Testing for correlation

- It is also possible to test whether a sample correlation r is large enough to indicate a nonzero population correlation
- Test statistic:
- Note: The test only works for normal distributions and linear correlations: Always also investigate scatter plot!

Influence of extreme observations

- NOTE: The result of a regression analysis is very much influenced by points with extreme values, in either the x or the y direction.
- Always investigate visually, and determine if outliers are actually erroneous observations

Example: Transformed variables

- The relationship between variables may not be linear
- Example: The natural model may be
- We want to find a and b so that the line approximates the points as well as possible

Example (cont.)

- When then
- Use standard formulas on the pairs (x1,log(y1)), (x2, log(y2)), ..., (xn, log(yn))
- We get estimates for log(a) and b, and thus a and b

Another example of transformed variables

- Another natural model may be
- We get that
- Use standard formulas on the pairs

(log(x1), log(y1)),

(log(x2), log(y2)), ...,(log(xn),log(yn))

Note: In this model, the curve goes through (0,0)

More than one independent variable: Multiple regression

- Assume we have data of the type

(x11, x12, x13, y1), (x21, x22, x23, y2), ...

- We want to ”explain” y from the x-values by fitting the following model:
- Just like before, one can produce formulas for a,b,c,d minimizing the sum of the squares of the ”errors”.
- x1,x2,x3 can be transformations of different variables, or transformations of the same variable

Multiple regression model

- The errors are independent random (normal) variables with expectation zero and variance
- The explanatory variables x1i, x2i, …, xni cannot be linearily related

Use of multiple regression

- Versions of multiple regression is the most used model in econometrics, and in health economics
- It is a powerful tool to detect and verify connections between variables

Doing a regression analysis

- Plot the data first, to investigate whether there is a natural relationship
- Linear or transformed model?
- Are there outliers which will unduly affect the result?
- Fit a model. Different models with same number of parameters may be compared with R2
- Make tests / confidence intervals for parameters

Interpretation

- The parameters may have important interpretations
- The model may be used for prediction at new values (caution: Extrapolation can sometimes be dangerous!)
- Remember that subjective choices have been made, and interpret cautiously

Download Presentation

Connecting to Server..