1 / 11

Causality and confounding variables

Causality and confounding variables. Scientists aspire to measure cause and effect Correlation does not imply causality. Hume: contiguity + order (cause then effect) + effect only when cause present

yehudah
Download Presentation

Causality and confounding variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Causality and confounding variables • Scientists aspire to measure cause and effect • Correlation does not imply causality. Hume: contiguity + order (cause then effect) + effect only when cause present • Confounding variables (extraneous factors) may intervene and effect both the proposed cause and effect.

  2. Correlation and Regression • Steps for making statistical predictions • Pearson product moment coefficient of correlation (r) – to measure strength of any linear relationship between variables – e.g. in bivariate correlation: age and salary level • Lies in the range -1< r < +1 • -1 perfect negative linear correlation; +1 perfect positive correlation; 0 no correlation • Only strength of relationship not cause-effect

  3. Steps for making statistical predictions continued… • Having established a correlation (strength) • Use ‘coefficient of determination’ (r2) to assess what proportion (%) of the relationship is explained by the Pearson r correlation • Evaluate the statistical significance (t-scores) – i.e. set the risk level of accepting calculated coefficients against null hypothesis • The selection of scatter diagrams (next) illustrates linear correlation principles

  4. A selection of scatter diagrams and associated correlation coefficients

  5. Now move on to prediction • From assessing the strength and power of a linear correlation between two variables • …move on to describing the nature of the relationship to assist in predicting The equation of a regression line has the form: Y = a + bX where Y is the dependent variable (the one we wish to predict / explain) and X is the independent variable. The value “a” is known as the intercept of the line and “b” measures the gradient of this line.

  6. Worked Example • LOS and age is correlated as r = 0.87207 from a survey of 30 employees in a firm • r (above) and r2 (0.760508) are strong – although this still leaves residuals at 24% (i.e. due to extraneous factors) • Is this significant? • Can we predict mean LOS at age 40? • What is the 95% confidence interval for LOS derived from one extra year age?

  7. Plotting the data we can see… The equation of the line linking length of service (y) and age (x) is: Y = -8.2194 + 0.45727x and SPSS reveals these coefficients for us This equation can be used to predict LOS at a selected age.

  8. Where do the figures come from to drop into the Y=a+bX equation? An SPSS regression printout gives us the data needed to solve the problem:

  9. Interpretation of the SPSS output Variables Entered/Removed This simply tells us that ‘age’ was the independent variable and ‘service’ the dependent variable. Model summary The value of the correlation coefficient (r) was 0.872 and the value of r2 was 0.761. Coefficients The ‘unstandardized coefficients’ give us the values of a and b in the regression equation. Thus the equation here is y = -8.219 + 0.457x The final column ‘Sig.’ gives values less than 0.01 thus we can say that the coefficients of the regression equation are significantly different from zero at the 1% (0.01) level (and thus at 5% (0.05) level). Casewise diagnostics During the input dialogue, SPSS was asked to show any standardised residuals outside the range -3 to + 3. The output shows that one reading, case number 2, had a large standardised residual. This indicates that this point does not fit the general trend of the straight line and can be regarded as an ‘outlier’ (i.e. an unusual reading).

  10. The solution… Y = a + bX(where Y is LOS; X is age)Y = -8.2194 + 0.45727x Y = -8.2194 + 0.45727(40) Y = -8.2194 + 18.29 Y = 10.07 years’ service predicted at age 40* And … there is a 95 per cent probability that the mean additional LOS for each extra year in age lies in the range: 0.358 to 0.557 (as supplied in the SPSS output). * Have a glance back at the scattergram to check this visually

  11. Basic Quants: A Summary • We have introduced the modelling concept • We have reflected on data types/displays • We have engaged with probability theory • We have touched on • Significance testing of hypotheses using both parametric and non-parametric statistics • Prediction from what is known to make an informed estimate of the variable of interest • Work through the assignment with the booklet provided alongside and this will guide solution of every aspect!

More Related