1 / 18

Correlation and Regression-I

Correlation and Regression-I. QSCI 381 – Lecture 36 (Larson and Farber, Sect 9.1). Overview-I. Negative linear correlation. Many of the questions commonly addressed in the natural sciences relate to whether there is a (significant) (linear) relationship between two variables.

kassandras
Download Presentation

Correlation and Regression-I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation and Regression-I QSCI 381 – Lecture 36 (Larson and Farber, Sect 9.1)

  2. Overview-I Negative linear correlation • Many of the questions commonly addressed in the natural sciences relate to whether there is a (significant) (linear) relationship between two variables. Positive linear correlation

  3. Overview-II • There may be no correlation between two variables or a non-linear relationship. No relationship Non-linear relationship

  4. Overview-III • We will be looking at two questions this week: • Is there really a relationship between two variables (the correlation problem). • How do we predict values for one variable, given particular values for the other variable (the regression problem).

  5. Independent and Dependent Variables • A is a relationship between two variables. The data can be represented by the ordered pair (x, y) where x is the independent, or explanatory, variable and y is dependent, or response, variable. correlation

  6. Correlation Coefficient-I • Correlations can be examined using scatterplots. However, these are subjective. The , r, is a measure of the strength and the direction of a linear relationship between two variables: • The population correlation coefficient is denoted . sample correlation coefficient

  7. Correlation Coefficient-II • The correlation coefficient ranges between -1 and 1. r=0.9481 r=-0.9306 r=-0.0612

  8. Example – Calculating r Recruitment in a fish stock must relate in some way to theenvironment. Assess the level of correlation between recruitment and Sea Surface Temperature (SST).

  9. Example – Calculating r In EXCEL, we can calculate correlation coefficients using the formula: CORREL(A1:A10,B1:B10)

  10. Testing the Significance of a Correlation Coefficient-I • r is a statistic – it is based on a paired sample from the population. • We want to infer the significance of the population correlation coefficient, , from the sample correlation coefficient, r.

  11. Testing the Significance of a Correlation Coefficient-II • State H0 and Ha and specify . • Set the d.f.=n-2 and find the critical values of the t-distribution with n-2 degrees of freedom. • Find the standardized test statistic. • Make a decision to reject or fail to reject the null hypothesis.

  12. Example-I • Test the hypothesis =0 for the SST – Recruitment data set. Assume that =0.05. • H0: =0 (no correlation); Ha: 0. • =0.05 - this is a two-tailed test with d.f.=14-2=12 degrees of freedom. • The rejection region is |t|>2.179. • The standardized test statistic is: • We reject the null hypothesis of no correlation.

  13. Example-II • Using the same data as for Example I, examine the claim: “Recruitment is higher when the temperature is higher”. Note that this claim was developed before the data were collected. • Note: whether a correlation coefficient is significant or not can be examined using tables (e.g. Table 11 of Appendix B). You can construct this table using EXCEL.

  14. Correlation and Causation-I • Perhaps the biggest mistake that can be made when analyzing data is to confuse correlation and causation. The fact that two variables are strongly correlated does not in itself imply a cause-and-effect relationship between the variables.

  15. Correlation and Causation-II • Evaluating whether causation can be inferred is not trivial – infer causation only after thinking about the following: • Did I think of the relationship between these variables before I saw the data? • Is there a good theoretical reason for a direct cause-and-effect relationship between the variables? • Is there perhaps a reverse cause-and-effect relationship between the variables? • Is there perhaps a third (or more) variable(s) which determine(s) BOTH x and y. • Is the relationship just coincidental?

  16. Correlation and Causation-III • The danger of too much data. • We spend five months in the field collecting data to examine the question of the relationship between the density of sparrows and the environment. Thanks to the internet we identify 120 possible environmental data series. We correlate the density estimates with each. What should we expect to find by doing this?

  17. Correlation and Causation-IV Significant =0.05 Significant =0.05 Type I Error: probability of rejecting the null hypothesis when it is in fact true!

  18. Correlation and Causation-V • “Looking for” variables that correlate with a variable of interest is variously called “data mining”, “data dredging”, “statistical fishing”.

More Related