Linear regression models
Download
1 / 31

Linear regression models - PowerPoint PPT Presentation


  • 215 Views
  • Uploaded on

Linear regression models. Simple Linear Regression. History. Developed by Sir Francis Galton (1822-1911) in his article “Regression towards mediocrity in hereditary structure”. Purposes:.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Linear regression models' - serafina-mauro


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


History
History

  • Developed by Sir Francis Galton (1822-1911) in his article “Regression towards mediocrity in hereditary structure”


Purposes
Purposes:

  • To describe the linear relationship between two continuous variables, the response variable (y-axis) and a single predictor variable (x-axis)

  • To determine how much of the variation in Y can be explained by the linear relationship with X and how much of this relationship remains unexplained

  • To predict new values of Y from new values of X


The linear regression model is
The linear regression model is:

  • Xi and Yi are paired observations (i = 1 to n)

  • β0= population intercept (when Xi =0)

  • β1= population slope (measures the change in Yi per unit change in Xi)

  • εi= the random or unexplained error associated with the i th observation. The εi are assumed to be independent and distributed as N(0, σ2).


Linear relationship
Linear relationship

Y

ß1

1.0

ß0

X


Linear models approximate non linear functions over a limited domain
Linear models approximate non-linear functions over a limited domain

extrapolation

extrapolation

interpolation


Linear regression models

  • For a given value of limited domainX, the sampled Y values are independent with normally distributed errors:

Yi = βo + β1*Xi+ εi

ε ~ N(0,σ2)  E(εi) = 0

E(Yi ) = βo + β1*Xi

Y

E(Y2)

E(Y1)

X

X1

X2


Linear regression models

Fitting data to a linear model: limited domain

Yi

Yi – Ŷi = εi (residual)

Ŷi

Xi


The residual
The residual limited domain

The residual sum of squares


Estimating regression parameters
Estimating Regression Parameters limited domain

  • The “best fit” estimates for the regression population parameters (β0 and β1) are the values that minimize the residual sum of squares (SSresidual) between each observed value and the predicted value of the model:


Linear regression models

Sum of squares limited domain

Sum of cross products


Least squares parameter estimates
Least-squares parameter estimates limited domain

where


Linear regression models

Sample variance of limited domainX:

Sample covariance:


Solving for the intercept
Solving for the intercept: limited domain

Thus, our estimated regression equation is:


Hypothesis tests with regression
Hypothesis Tests with Regression limited domain

  • Null hypothesis is that there is no linear relationship between X and Y:

    H0: β1 = 0  Yi = β0 + εi

    HA: β1 ≠ 0  Yi = β0 + β1 Xi + εi

  • We can use an F-ratio (i.e., the ratio of variances) to test these hypotheses


Variance of the error of regression
Variance of the error of regression: limited domain

NOTE: this is also referred to as residual variance, mean squared error (MSE) or residual mean square (MSresidual)


Mean square of regression
Mean square of regression: limited domain

The F-ratio is: (MSRegression)/(MSResidual)

This ratio follows the F-distribution with (1, n-2) degrees of freedom






Parametric confidence intervals
Parametric Confidence Intervals limited domain

  • If we assume our parameter of interest has a particular sampling distribution and we have estimated its expected value and variance, we can construct a confidence interval for a given percentile.

  • Example: if we assume Y is a normal random variable with unknown mean μ and variance σ2, then is distributed as a standard normal variable. But, since we don’t know σ, we must divide by the standard error instead: , giving us a t-distribution with (n-1) degrees of freedom.

  • The 100(1-α)% confidence interval for μ is then given by:

  • IMPORTANT: this does not mean “There is a 100(1-α)% chance that the true population mean μ occurs inside this interval.” It means that if we were to repeatedly sample the population in the same way, 100(1-α)% of the confidence intervals would contain the true population mean μ.







Regression
Regression limited domain


Assumptions of regression
Assumptions of regression limited domain

  • The linear model correctly describes the functional relationship between X and Y

  • The X variable is measured without error

  • For a given value of X, the sampled Y values are independent with normally distributed errors

  • Variances are constant along the regression line