1 / 0

Review

Review. Guess the correlation. -2.0 -0.9 -0.1 0.1 0.9. Review. Calculate the correlation between X and Y. z X = [ -1.0 - 0.3 -0.2 -0.6 - 1.6] z Y = [ -0.6 - 1.3 - 0.6 - 0.0 -1.3] -1.21 -.30 -.24 -.01. Review. Which statement is true?

gavivi
Download Presentation

Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review Guess the correlation. -2.0 -0.9 -0.1 0.1 0.9
  2. Review Calculate the correlation between X and Y. zX=[ -1.0 -0.3 -0.2 -0.6 -1.6] zY=[ -0.6 -1.3 -0.6 -0.0 -1.3] -1.21 -.30 -.24 -.01
  3. Review Which statement is true? If the correlation is zero, then the variables are independent If the correlation is nonzero, then the variables are independent If two variables are independent, then their correlation is zero If two variables have a nonlinear relationship, then their correlation is zero
  4. Regression

    11/5
  5. Regression Correlation can tell how to predict one variable from another What about multiple predictor variables? Explaining Y using X1, X2, … Xm Income based on age, education Memory based on pre-test, IQ Social behavior based on personality dimensions Regression Finds best combination of predictors to explain outcome variable Determines unique contribution of each predictor Can test whether each predictor has reliable influence
  6. Linear Prediction One predictor Draw a line through data Multiple predictors Each predictor has linear effect Effects of different predictors simply add together Intercept: b0 Value of Y when all Xs are zero Regression coefficients: bi Influence of Xi Sign tells direction; magnitude tells strength Can be any value (not standardized like correlation)
  7. Example Predict income from education and intelligence Y = $k/yr X1 = years past HS X2 = IQ Regression equation: 4-year college, IQ = 120 -70 + 10*4 + 1*120 = 90  $90k/yr 2-year college, IQ = 90 -70 + 10*2 + 1*90 = 40  $40k/yr
  8. Finding Regression Coefficients Goal: regression equation with best fit to data Minimize squared error between Y and Math solution Matrix algebra Combines Xs into a matrix Rows for subjects, columns for variables Transposes, matrix inverse, derivatives and other fun Practical solution Use a computer R: linear model function, lm(Y~X1+X2+X3)
  9. Regression vs. Correlation One predictor: regression = correlation Plus commitment about which variable predicts the other Multiple predictors Predictors affect each other Each coefficient depends on what other predictors are included Example: 3rd-variable problem Math achievement and verbal achievement both depend on overall IQ Math helps with English? Add IQ as a predictor: verbal ~ math + IQ bmath will be near 0 No effect of math once IQ is accounted for Regression finds best combination of predictors to explain outcome Best values of b1, b2, etc. taken together Each coefficient shows contribution of predictor beyond effects of others Allows identification of most important predictors
  10. Explained Variability Total sum of squares Variability in Y SSY = S(Y – MY)2 Residual sum of squares Deviation between predicted and actual scores SSresidual = S(Y – )2
  11. Explained Variability SSregression Variability explained by regression Difference between total and residual sums of squares SSregression = SSY - SSresidual R2 ("R-squared") Fraction of variability explained by the regression SSregression/SSY Measures how well the predictors (Xs) can predict or explain the outcome (Y) Same as r2 from correlation when only one predictor SSregression SSY SSresidual
  12. Hypothesis Testing with Regression Does the regression explain meaningful variance in the outcome? Null hypothesis All regression coefficients equal zero in population Regression only “explaining” random error in sample Approach Find how much variance the predictors explain: SSregression Compare to amount expected by chance Ratio gives F statistic, which we can use for hypothesis testing If F is larger than expected by H0, regression explains more variance than expected by chance Reject null hypothesis if F is large enough
  13. Hypothesis Testing with Regression Likelihood function for F is an F distribution Comes from ratio of two chi-square variables Two df values, from numerator and denominator SSregression SSY SSresidual P(F5,10) a p Fcrit F
  14. Degrees of Freedom SSregression dfregression = m SSY dfY= n - 1 dfresidual = n – m – 1
  15. Testing Individual Predictors Does predictor variable have any effect on outcome? Does it provide any information beyond other predictors? Null hypothesis: bi = 0 in population Other bs might be nonzero Compute standard error for bi Uses MSresidual to estimate t statistic: Get tcrit or p-value from t distribution with dfresidual Can do one-tailed test if predicted direction of effect Sign of bi
  16. Review Predicting income from education and IQ: A high-school graduate with IQ = 125 considers getting a 4-year bachelor’s degree. What is her expected increase in income? $4k $40k $95k $165k
  17. Review A study investigating personality differences in speech rate measures 100 subjects on 5 personality dimensions. These dimensions are used to predict words per minute (WPM) in a timed speech. The total variability in the outcome variable is SSWPM = 12000. The residual variability after accounting for the personality predictors is SSresidual = 9000. What is R2 for the regression? 0.06 0.25 0.75 30 3000
  18. Review A study investigating personality differences in speech rate measures 100 subjects on 5 personality dimensions. These dimensions are used to predict words per minute (WPM) in a timed speech. The total variability in the outcome variable is SSWPM = 12000. The residual variability after accounting for the personality predictors is SSresidual = 9000. Calculate the F statistic for testing whether personality predicts speech rate. (dfregression = 5, dfresidual = 94) 0.02 0.33 4.70 6.27
More Related