Skip this Video
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 28

Regression - PowerPoint PPT Presentation

  • Uploaded on

Regression. Petter Mostad 2005.10.10. Some problems you might want to look at. Given the annual number of cancers of a certain type, over a few decades, make a prediction for the future, with uncertainty.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Regression' - allistair-guthrie

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Petter Mostad


some problems you might want to look at
Some problems you might want to look at
  • Given the annual number of cancers of a certain type, over a few decades, make a prediction for the future, with uncertainty.
  • There seems to be a connection between efficiency and size for Norwegian hospitals. Given data from many hospitals, determine if there is a connection, and what it is.
  • Investigate the connection between efficiency and a number of possible explanatory variables.
connection between variables
Connection between variables

We would like to study connection between x and y!

what can you do with a fitted line
What can you do with a fitted line?
  • Interpolation
  • Extrapolation (sometimes dangerous!)
  • Interpret the parameters of the line
how to define the line that fits best
How to define the line that ”fits best”?

The sum of the squares of

the ”errors” minimized


Least squares method!

  • Note: many other ways to fit the line can be imagined
how to compute the line fit with the least squares method
How to compute the line fit with the least squares method?
  • Let (x1, y1), (x2, y2),...,(xn, yn) denote the points in the plane.
  • Find a and b so that y=a+bx fit the points by minimizing
  • Solution:

where and all sums are done for i=1,...,n.

how do you get this answer
How do you get this answer?
  • Differentiate S with respect to a og b, and set the result to 0

We get:

This is two equations with two unknowns, and the solution of these give the answer.


Some grasshoppers make sound by rubbing their wings against each other. There is a connection between the temperature and the frequency of the movements, unique for each species. Here are some data for Nemobius fasciatus fasciatus:

If you measure 18 movements per sec, what is estim. temperature?

Data from Pierce, GW. The Songs of Insects. Cambridge, Mass.: Harvard University Press, 1949, pp. 12-21

example cont
Example (cont.)


Answer: Estimated temperature

y against x x against y
y against x ≠ x against y
  • Linear regression of y against x does not give the same result as the opposite.

Regression of

y against x

Regression of x against y

centered variables
Centered variables
  • Assume we subtract the average from both x- and y-values
  • We get and
  • We get and
  • From definitions of correlation and standard deviation se get

(even in uncentered case)

  • Note also: The residuals sum to 0.
anaylzing the variance
Anaylzing the variance
  • Define
    • SSE: Error sum of squares
    • SSR: Regression sum of squares
    • SST: Total sum of squares
  • We can show that


  • Define
  • R2 is the ”coefficient of determination”
but how to answer questions like
But how to answer questions like:
  • Given that a positive slope (b) has been estimated: Does it give a reproducible indication that there is a positive trend, or is it a result of random variation?
  • What is a confidence interval for the estimated slope?
  • What is the prediction, with uncertainty, at a new x value?
the standard simple regression model
The standard simple regression model
  • We have to do as before, and define a model

where are independent, normally distributed, with equal variance

  • We can then use data to estimate the model parameters, and to make statements about their uncertainty
confidence intervals for simple regression
Confidence intervals for simple regression
  • In a simple regression model,
    • a estimates
    • b estimates
    • estimates
  • Also,

where estimates variance of b

  • So a confidence interval for is given by
hypothesis testing for simple regression
Hypothesis testing for simple regression
  • Choose hypotheses:
  • Test statistic:
  • Reject H0 if or
prediction from a simple regression model
Prediction from a simple regression model
  • A regression model can be used to predict the response at a new value xn+1
  • The uncertainty in this prediction comes from two sources:
    • The uncertainty in the regression line
    • The uncertainty of any response, given the regression line
  • A confidence interval for the prediction:
testing for correlation
Testing for correlation
  • It is also possible to test whether a sample correlation r is large enough to indicate a nonzero population correlation
  • Test statistic:
  • Note: The test only works for normal distributions and linear correlations: Always also investigate scatter plot!
influence of extreme observations
Influence of extreme observations
  • NOTE: The result of a regression analysis is very much influenced by points with extreme values, in either the x or the y direction.
  • Always investigate visually, and determine if outliers are actually erroneous observations
example transformed variables
Example: Transformed variables
  • The relationship between variables may not be linear
  • Example: The natural model may be
  • We want to find a and b so that the line approximates the points as well as possible
example cont1
Example (cont.)
  • When then
  • Use standard formulas on the pairs (x1,log(y1)), (x2, log(y2)), ..., (xn, log(yn))
  • We get estimates for log(a) and b, and thus a and b
another example of transformed variables
Another example of transformed variables
  • Another natural model may be
  • We get that
  • Use standard formulas on the pairs

(log(x1), log(y1)),

(log(x2), log(y2)), ...,(log(xn),log(yn))

Note: In this model, the curve goes through (0,0)

more than one independent variable multiple regression
More than one independent variable: Multiple regression
  • Assume we have data of the type

(x11, x12, x13, y1), (x21, x22, x23, y2), ...

  • We want to ”explain” y from the x-values by fitting the following model:
  • Just like before, one can produce formulas for a,b,c,d minimizing the sum of the squares of the ”errors”.
  • x1,x2,x3 can be transformations of different variables, or transformations of the same variable
multiple regression model
Multiple regression model
  • The errors are independent random (normal) variables with expectation zero and variance
  • The explanatory variables x1i, x2i, …, xni cannot be linearily related
use of multiple regression
Use of multiple regression
  • Versions of multiple regression is the most used model in econometrics, and in health economics
  • It is a powerful tool to detect and verify connections between variables
doing a regression analysis
Doing a regression analysis
  • Plot the data first, to investigate whether there is a natural relationship
  • Linear or transformed model?
  • Are there outliers which will unduly affect the result?
  • Fit a model. Different models with same number of parameters may be compared with R2
  • Make tests / confidence intervals for parameters
  • The parameters may have important interpretations
  • The model may be used for prediction at new values (caution: Extrapolation can sometimes be dangerous!)
  • Remember that subjective choices have been made, and interpret cautiously