600 likes | 602 Views
Here, pal! Regress this!. presented by. Miles Hamby , PhD Research & Training Consultants MilesFlight.com. Here, pal! Regress this!. presented by. Miles Hamby , PhD Director of Institutional Research & Assessment Strayer University 202-419-0402 mile.hamby@strayer.edu.
E N D
Here, pal! Regress this! presented by Miles Hamby, PhD Research & Training Consultants MilesFlight.com
Here, pal! Regress this! presented by Miles Hamby, PhD Director of Institutional Research & Assessment Strayer University 202-419-0402 mile.hamby@strayer.edu
Typical – Descriptive Statistics • Frequencies – numbers of things • eg – 70 out 340 (21% ) of female students have graduated over the last 6 years • Mean – measure of central tendency • eg – The average time to complete an academic program for students with 12 hours transfer credit is 36 terms. • Standard Deviation – measure of dispersion • eg – 68% of completing students graduate between 25 and 42 terms
Shortcoming of Descriptive Statistics They can tell you what it is – but they can’t tell you what it will be They do not predict.
Regression predicts! eg - Can we predict how many female students will graduate and when? Can we predict when a student with no transfer credit will graduate? Can we predict the likelihood of graduation of a student based on gender?
How to Use Regression to Predict Question – What kind of student takes the longest time to graduate? What kind of student never graduates?
Typical way – • Start with specific cohort (eg, Fall 1993) • Select a single group (eg, 1-12 transfer credits) • Count number who graduate each term • Compute percentage ~ • 25 graduated 100 started = 25% Conclusion – For Fall 93 cohort, graduation rate = 25% after 12 terms for those with 1-12 transfer credits
Exiguousness of Typical Method – • DV implied, not specified (and therefore not tested) • Does not measure strength of association (correlation) to graduation time or amount of effect (slope) on graduation time • eg – compare age’s effect to transfer credits’ effect • Graduation Rate does not predict time-in-program or time-to-completion, or even whether or not one will graduate • Must repeat procedure for each time block
X X X Variable Time to Graduation Females ~ 1-12 Xfer Cr ~ Married ~ = 16 terms, S = 5 terms = 13 terms, S = 4 terms = 18 terms, S = 9 terms Typical Method, e.g. Time to graduation for each variable not discrete - includes all other variables
But how about a single, black, man with 17 transfer credits? Must repeat procedure for single students, then repeat for black students, then repeat for males then repeat for 13 – 20 transfer credits, then ‘eyeball’ how they correlate. Is there a way to determine how much of the 16 terms time for females (previous ex.) would be ameliorated by being a single, black, male with 17 transfer credit hours?
There is a way! Regress it! Effects of gender, age, transfer credits, marital status, citizenship, ethnicity, and more, directly on time to complete are measurable and comparable Pick a profile and I’ll tell you how long it will take for that student to graduate!
Procedure – 1. Identify dependent variable (DV) – i.e, the question you are asking – eg, Time to Graduate (Time) 2. Identify independent variables (IV) that possibly effect graduation rates – gender, ethnicity, marital status, age, transfer credits, income 3. Collect data 4. Runlinear regression to determine: (a) correlations between Time and IVs (b) significance of difference in means of IVs (c) regression model (y = a+b1X1…bnXn) to predict Time by IVs
Regression can tell you everything! EG – For a single male, age 32, with 18 transfer credits - we can expect a graduation time of 32 terms # Terms = a + .4*Marital + .2*Gender + .06*Age - .18*xfer # Terms = 33 terms + .4*0 + .2*0 + .06*32 - 1.7*18 32 terms = 33 terms + 0 + 0 + 2 - 3
Adding Variables DV ~ Time to Graduation (# terms - ratio) • IV ~ Gender (F or M - nominal) • Ethnic (B, H, W, NA, API, Alien, Unk - nominal) • Alien (Alien or US - nominal) • Marital status (si, ma, di – nominal) • Age (# years - ratio) • Transfer credits (# hours - ratio) • Tutoring done (# sessions – ratio; Y/N - nominal
Coding Your Variables Scale (ratio) variables (time to completion, age, etc) – use number directly • eg, Age = 32 years, use ’32’ • Time to Comp (terms) = 12 terms, use ’12’
Coding Your Variables Nominal Variables – use ‘dummies’ What are Dummy Variables? Variables used to quantify nominal variables i.e., Nominal (qualitative) variables assigned a quantitative number and treated as a quantitative variable.
Dummy Variables Dichotomous variable – two categories • eg - Male or Female • Married or Single • Has had tutoring or hasn’t • US Citizen or Alien • Graduate student or Undergrad Polychotomous variable – several categories of the variable • eg – Ethnic - African-American, Hispanic, White • Major – Bus, Account, Computers, English, LA • Religion – Christian, Jew, Muslim, Hindu
Dummy Variables • eg, ‘Gender’ • Code Male = 0, Female = 1 (or vice-versa) • 1 = ‘presence of characteristic’ (femaleness) • 0 = ‘absence of characteristic’ ‘Ethnic’ Make B, NA/AN, W, API,H, Unk unique variables Code as1 = ‘presence of characteristic’ (‘Black’-ness) or 0 = ‘absence of characteristic’
Dummy Variables Alien: 1 = yes, 2 = no Marital: 1 = MA/DI 0 = SI Gender: 1 = F, 0 = M Age: number years Transfer credits: number B: 1 = yes, 0 = no AN: 1 = yes, 0 = no W: 1= yes, 0 = no API: 1 = yes, 0 = no H: 1 = yes, 0 = no Unk: 1 = yes, 0 = no # Terms = 3 terms + .2*1 + .3*32 + 1.2*10 + .4*3
As Used in the Regression e.g. ~ Black, US Citizen, single, female, married, 32 years old, 10 transfer credits: • # Terms = 32 terms + [.2*1+.2*0+.2*0 +.2*0] (ethnic) • + .5*0 (Alien) + .4*1 (marital) • + .2*1 (gender) • + .06*32 (age) • - 1.7*10 (xfer credits)
SEX GENDR TUTSES TUTRD LEVEL U/G MARITL MARIT VISA ALIEN F 1 3 1 U 1 SI 0 F-1 1 F 1 2 1 U 1 SI 0 US 0 M 0 1 1 U 1 DI 1 US 0 F 1 0 0 G 0 MA 1 P-R 1 M 0 1 1 U 1 MA 1 GREEN 1 M 0 0 0 U 1 SI 0 US 0 F 1 0 0 G 0 9 F-4 1 • Nominal Variables – Dichotomous - 2 values • Create new column for dummy variable or recode original • 1 = presence of characteristic of interest • 0 = not the characteristic of interest (absence of characteristic)
MAJOR ACC BUS CIS ETHNIC 1BLACK 2NATAM 3WHITE 4ASIAN 5HISP 0UNKN ACC 1 0 0 1 1 0 0 0 0 0 ACC 1 0 0 5 0 0 0 0 1 0 BUS 0 1 0 3 0 0 1 0 0 0 CIS 0 0 1 0 0 0 1 0 0 1 BUS 0 1 0 1 1 0 0 0 0 0 CIS 0 0 1 2 0 1 0 0 0 0 ACC 1 0 0 4 0 0 0 1 0 0 • Nominal Variables – more than 2 values • Create new columns for dummy variables • – one for each value • 1 = presence of characteristic (value) • 0 = absence of characteristic
Run the Regression SPSS ANALYZE/REGRESSION/LINEAR/DV to Dependent, first model IVs to Independent/NEXT/2nd model IVs to Independent/NEXT or STATISTICS/check Model Fit, Descriptives, R Squared Change/Continue/OK
Variable Correlations .005 .338 Note – although all variables show correlation to each other, the correlation (R) may not be significant
The Regression ANOVA Test of significance of the F statistic indicates all three the regression models are statistically significant (Sig. < .05) i.e, the variation was not by chance – another set of data would probably show the same results.
893.215 38.960 F = = 22.926 The Regression ANOVA The larger the F (ratio of the mean square of the Regression and mean square of the Error/Residual), the more robust the regression equation. I.e., the smaller the mean square residual, indicates smaller error or departure from the regression line.
Model 2 y Model 1 Y error y error ŷ ŷ QTRS to Completion + 0 Variation about the Regression Line Interpretation – Mean Square Error/Residual of Model 1 is > Mean Square Error of Model 2
The Regression Correlation (R) Model 3 returns the highest correlation (R = .392) with 15.4% (R2 = .154) of the variation in Time to Completion (in Qtrs) being explained by the variables Alien, Ethnicity, Marital status, Gender, Age, Tutoring, Transfer credits, U/G status, and Major.
The Slopes Model 3 Interpretation • The older the student, the shorter the time to completion (B = -.117)
Model 3 Slopes Graph – AGE Y Interpretation – Age slope shallow, slight effect on Qtrs to Completion AGE B = - .117 35.577 QTRS to Completion 0 yrs 70 yrs
The Slopes Model 3 Interpretation • The older the student, the shorter the time to completion (B = -.117) • Married/Divorced tends to shorten completion time • (B= -.0405), but is not significant (Sig. = .309, >.05)
Model 3 Slopes Graph – Married/Divorced Y Interpretation – Married/Divorced very shallow, but not significant (Sig. <.000) Married B = - .0405 35.577 QTRS to Completion 0 (Single) 1 (Married/Divorced)
The Slopes Model 3 Interpretation • The older the student, the shorter the time to completion (B = -.117) • Married/Divorced tends to shorten completion time • (B= -.0405), but is not significant (Sig. = .309, >.05) • Undergraduates tend to take considerably less time to complete than graduates • (B = -3.259)
Model 3 Slopes Graph – Undergraduate vs Graduate Y Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates 35.577 Under B = - 3.259 QTRS to Completion 0 (Graduate) 1 (Undergraduate)
The Slopes Model 3 Interpretation • The older the student, the shorter the time to completion (B = -.117) • Married/Divorced tends to shorten completion time • (B= -.0405), but is not significant (Sig. = .309, >.05) • Undergraduates tend to take considerably less time to complete than graduates • (B = -3.259) • Tutoring shortens time very slightly (B = -.0471), but is not significant (Sig. =.571)
Model 3 Slopes Graph – Undergraduate vs Graduate Y Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates, but not significant (Sig. .571 > .05) QTRS to Completion 35.577 Tutored B = - .000000471 0 (No Tutoring) 1 (Tutored)
The Slopes Mode 3 Interpretation • Xfer slightly lengthens time (B=.04285) very slightly; GPA shortens time but is not significant (Sig. >.05)
Model 3 Slopes Graph – GPA & Transfer Credits Interpretation – Xfer & GPA very shallow, but GPA not significant (Sig. <.000) Y GPA B = - .277 35.577 QTRS to Completion Xfer B = - .04285 0 50 100 150 Xfer 0 1.00 2.00 3.00 4.00 GPA
The Slopes Model 3 Interpretation • Xfer lengthens slightly; GPA shortens, but not significant • Female tends to shorten time (B = -.110) over Male
Model 3 Slopes Graph - Gender Y Interpretation – Female Qtrs to Completion tend to be predictably shorter than Male Qtrs 35.577 QTRS to Completion Gender B = - .110 1 (Female) 0 (Male) X
The Slopes Model 3 Interpretation • Xfer lengthens slightly; GPA shortens, but not significant • Female tends to shorten time (B = -.329) over Male • Black, Nat Am & Unkn take longer than Whites (+ B) (NA not significant) Hisp & Asians tend to take shorter than Whites (-B)
Model 3 Slopes Graph - Ethnicity Native Am B = .719 Black B = .439 Y Unknown .531 White B = 0 QTRS to Completion Asian -.553 Hispanic B = - .830 X Interpretation – Black, Asian & Unknown tend to take longer than Whites (+ B); Hispanic & Native American tend to take shorter than Whites (-B)
The Slopes Model 3 Interpretation • Xfer lengthens slightly; GPA shortens, but not significant • Female tends to shorten time (B = -.329) over Male • Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) • Alien tends to take less time than US citizen (B = -.618)
Model 3 Slopes Graph - Alien Y Interpretation – Alien tends to take less time than US citizen (B = .279) QTRS to Completion Alien B = - .618 1 (Alien) 0 (US) X
The Slopes Model 3 Interpretation • Xfer lengthens slightly; GPA shortens, but not significant • Female tends to shorten time (B = -.329) over Male • Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) • Alien tends to take less time than US citizens (B = -.618) • Acc & Bus considerable effect (B= 2.638, 2.651); pos. relative to CIS slope ‘0’
Model 3 Slopes Graph - Major Business B = 2.651 Y Accounting B = 2.638 QTRS to Completion Computers B = 0 X Interpretation – Accounting & Business steepest slopes (2.638, 2.651); positive relative to CIS slope ‘0’
The Equation MODEL 3 IV B (Slope) (Constant) 35.577 Age -.117 Gender -.110 Married -4.05E-02 Black .439 Native Am .719 Asian -.553 Hispanic -.830 Unknown .531 Alien -.618 GPA -.277 Transfer Cr 4.285E-02 Undergrad -3.259 Tutoring -4.71E-07 Accounting 2.638 Business 2.651 Y = a + bAge + bGen + bMar +bBlk + bNA + bAsn + bHis + bUnk + bAln + bGPA + bXfer + bUndergrad + bTutor + bAcc + bBus Y = 35.57 + (-.11)Age + (-.11)Gen + (-.04)Mar + (.43)Black + (.71)NatAm + (-.55)Asian + (-.83)Hisp + (-.53)Unk + (-.61)Alien + (.27)GPA + (.04)Xfer + (-3.25)Under + (-.04)Tutor + (2.63)Acc + (2.65)Bus