slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
PSY 1950 Regression November 10, 2008 PowerPoint Presentation
Download Presentation
PSY 1950 Regression November 10, 2008

Loading in 2 Seconds...

play fullscreen
1 / 22

PSY 1950 Regression November 10, 2008 - PowerPoint PPT Presentation


  • 220 Views
  • Uploaded on

PSY 1950 Regression November 10, 2008. Definition. Simple linear regression Models the linear relationship between one predictor variable and one outcome variable e.g., predicting income based upon age Multiple linear regression

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'PSY 1950 Regression November 10, 2008' - ovidio


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
PSY 1950

Regression

November 10, 2008

definition
Definition
  • Simple linear regression
    • Models the linear relationship between one predictor variable and one outcome variable
    • e.g., predicting income based upon age
  • Multiple linear regression
    • Models the linear relationship between more than one predictor variables and one outcome variable
    • e.g., predicting income based upon age and sex
  • Lingo
    • Independent/dependent, predictor/outcome
history
History
  • Astronomical predictions: method of least squares
    • Piazzi (1801) spotted Ceres, made 22 observations over 41 days, got sick, lost Ceres
    • Gauss: "... for it is now clearly shown that the orbit of a heavenly body may be determined quite nearly from good observations embracing only a few days; and this without any hypothetical assumption.”
  • Genetics: Regression to the mean
    • Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute, 15, 246–263.
lines
Lines
  • Mathematically, a line is defined by its slope and intercept
    • Slope is change in Y per change in X
    • Intercept is the points at which the line crosses the Y-axis, i.e., Y when X = 0
  • Y = bX + a
    • b is slope
    • a is intercept
residuals
Residuals
  • Residuals are
    • Errors in prediction
    • Difference between expected values (under your model) and observed values (in your dataset)
minimizing residuals
Minimizing Residuals
  • Can define the best fit line by summing
    • Absolute residuals (Method of Least Absolute Deviations)
    • Squared residuals (Method of Least Squares)
which is better
Which is Better?
  • Method of Least Squares
    • Not robust
    • Stable (line doesn’t “jump” with small changes in X)
    • Only one solution (unique line for each dataset)
  • Method of Least Absolute Deviations
    • Robust
    • Unstable (line does “jump” with small changes in X)
    • Multiple solutions (sometimes)
  • http://www.math.wpi.edu/Course_Materials/SAS/lablets/7.3/7.3c/lab73c.html
multiple solutions
Multiple Solutions
  • Any line within the “green zone” produces the same summed residuals via the method of least absolute deviations
correlation and regression
Correlation and Regression
  • Statistical distinction based on nature of the variables
    • In correlation, both X and Y are random
    • In regression, X is fixed and Y is random
  • Practical distinction based on interest of researcher
    • With correlation, the researcher asks: What is the strength (and direction) of the linear relationship between X and Y
    • With regression, the research asks the above and/or: How do I predict Y given X?
goodness of fit
Goodness of Fit
  • The regression equation does not reveal how well your data fit your model
    • e.g., in the below, both sets of data produce the same regression equation
standard error of estimate

^

Standard Error of Estimate
  • The standard residual
  • Why df = n - 2?
    • To determine regression equation (and thus the residuals), we need to estimate two population parameters
      • Slope and intercept OR
      • Mean of X and mean of Y
    • A regression with n = 2 has no df
testing the model
Testing the Model

# predictors

n minus # model parameters

n minus (1 + # predictors)

online applets
Online Applets
  • Explaining variance
    • http://www.duxbury.com/authors/mcclellandg/tiein/johnson/reg.htm
  • Leverage
    • http://www.stat.sc.edu/~west/javahtml/Regression.html
  • Distribution of slopes/intercepts
    • http://lstat.kuleuven.be/java/version2.0/Applet003.html