1 / 23

Data Analysis

Data Analysis. A Few Necessary Terms. Categorical Variable : Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable : Measurements along a continuum, such as Flow Velocity. What type of variable would “Mottled Sculpin /meter 2” be?

giulio
Download Presentation

Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Analysis

  2. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements along a continuum, such as Flow Velocity What type of variable would “Mottled Sculpin /meter2” be? What type of variable is “Substrate Type”? What type of variable is “% of bank that is undercut”?

  3. A Few Necessary Terms Explanatory Variable: Independent variable. On x-axis. The variable you use as a predictor. Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable.

  4. Statistical Tests: Appropriate Use For our data, the response variable will always be continuous. T-test: A categorical explanatory variable with 2 options. ANOVA: A categorical explanatory variable with >2 options. Regression: A continuous explanatory variable

  5. Statistical Tests Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha). Test Statistic: p-value:The probability of observing our data or more extreme data assuming the null hypothesis is correct Statistical Significance: We reject the null hypothesis if the p-value is below a set value, usually 0.05.

  6. Student’s T-Test Tests the statistical significance of the difference between means from two independent samples

  7. Compares the means of 2 samples of a categorical variable Mottled Sculpin/m2 Cross Plains Salmo Pond

  8. Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution (histogram) • Samples are independent • Assumed equal variance (boxplot) • No other sample biases • Interpreting the p-value

  9. Analysis of Variance (ANOVA) Tests the statistical significance of the difference between means from two or more independent samples Grand Mean Mottled Sculpin/m2 Riffle Pool Run ANOVA website

  10. Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution • Samples are independent • Assumed equal variance • No other sample biases • Interpreting the p-value • Pairwise T-tests to follow

  11. Simple Linear Regression • What is it? Least squares line • When is it appropriate to use? • Assumptions? • What does the p-value mean? The R-value? • How to do it in excel

  12. Simple Linear Regression Tests the statistical significance of a relationship between two continuous variables, Explanatory and Response

  13. Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution • Samples are independent • Assumed equal variance • Relationship is linear • No other sample biases • Interpret the p-value and R-squared value.

  14. Residual Plots Residuals are the distances from observed points to the best-fit line Residuals always sum to zero Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.

  15. Residuals

  16. Residual vs. Fitted Value Plots Observed Values (Points) Model Values (Line)

  17. 0 Residual Plots Can Help Test Assumptions 0 “Normal” Scatter Curve (linearity) Fan Shape: Unequal Variance 0

  18. Have we violated any assumptions?

  19. R-Squared and P-value High R-Squared Low p-value (significant relationship)

  20. R-Squared and P-value Low R-Squared Low p-value (significant relationship)

  21. R-Squared and P-value High R-Squared High p-value (NO significant relationship)

  22. R-Squared and P-value Low R-Squared High p-value (No significant relationship)

  23. P-value indicates the strength of the relationship between the two variables You can think of this as a measure of predictability R-Squared indicates how much variance is explained by the explanatory variable. If this is low, other variables likely play a role. If this is high, it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!

More Related