1 / 29

Lecture 5 Association Statistics & Regression Analysis University of California, Merced

Math 15 Introduction to Scientific Data Analysis. Lecture 5 Association Statistics & Regression Analysis University of California, Merced. Course Lecture Schedule. Quiz Next Week!. Project #1 – Due March 31 st , 2008.

oistin
Download Presentation

Lecture 5 Association Statistics & Regression Analysis University of California, Merced

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Math 15Introduction to Scientific Data Analysis Lecture 5 Association Statistics & Regression Analysis University of California, Merced

  2. Course Lecture Schedule • Quiz Next Week!

  3. Project #1 – Due March 31st, 2008 • Projects can be performed individually or in groups of three, with following rules: • Teams turn in one project report and get the same grade. • A team consists of at most 3 people—no copying between teams! • Team project report must include a title page, where a team describe each team member’s contribution. • 10% bonus for projects done individually • Individual projects must not be copied from anyone else • No late project will be accepted! Project #1 will be posted at UCMCROP by Next Monday! UC Merced

  4. Mode Average Review:Measures of dispersion or variability • Variance or Standard Deviation • The one on the left is more dispersed than the one on the right. It has a higher variance or standard deviation. UC Merced

  5. s = 4.5 s (standard Deviation)= 23 mg 35.49 ml 446 Average Which is more precise measurement? • Although the standard deviation is a good measure of the precision of a given set of data, it can be difficult to compare the standard deviation from two different types of measurements directly. • You might need to do such a comparison to determine the largest source of uncertainty in an experimentally determined answer UC Merced

  6. Get the Right Tool for the Job! UC Merced

  7. s = 4.5 s = 23 mg 35.49 ml 446 Average Measures of dispersion or variability RSD = 100x(4.5/35.49) = 12.7 • One way to do this comparison • A relative standard deviation, RSD, is simply the ratio of the standard deviation over the mean RSD = 100x(23/446) = 5.2 UC Merced

  8. Any Questions? UC Merced

  9. Common Practice for Data Analysis • A common task in data analysis is to investigate an association between two variables. • To see if two variables vary together • To see how one variable affect another. Correlation Regression UC Merced

  10. Correlation • A correlation tells us whether the two variables vary together. • i.e. as one goes up the other goes up (or goes down) Correlation Coefficient (Pearson product-moment correlation coefficient or Pearson’s r) UC Merced

  11. Correlation Coefficient • Vary from +1 (perfect correlation) through 0 (no correlation) to -1 (perfect negative correlation) UC Merced

  12. Correlation Coefficient – cont. • Always draw a diagram to check • There are no OUTLIERS. If there are outliers, the following may not apply. • The relation is not curved (r only refers to LINEAR correlation) UC Merced

  13. Excel Function – Correlation Coefficient • = CORREL(array1,array2) or • = PEARSON(array1,array2) Lengths of a leg bone (in cm) in penguin mating pairs Positive Correlation UC Merced

  14. Ice cream sales vs. number of people who drown at sea Correlation Coefficient 0.927 UC Merced

  15. Wait! What kinds of conclusion can we make from the correlation relationship? UC Merced

  16. Examples Not Good Ones! • Ice cream sales correlate with the number of people who drown at sea. • Therefore, ice cream causes people to drown. • Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply. • Hence, atmospheric CO2 causes crime. UC Merced

  17. Ice cream sales vs. number of people who drown at sea Correlation Coefficient 0.927 UC Merced

  18. Correlation does not imply causation • There can be no conclusion made regarding the existence or the direction of a cause and effect relationship only from the fact that A is correlated with B. • Correlation Coefficient only tells you whether the two variables vary together. • Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between A and B is statistically significant, a large effect size is observed, or a large part of the variance is explained. UC Merced

  19. Any Questions? UC Merced

  20. Regression • Regression is used when we have some reasons to believe that changes in one variablecause changes in the other. • Correlation coefficient is not evidence for a causal relationship. • The simplest kind of causal relationship is a straight-line (or linear) relationship. Linear regression UC Merced

  21. Linear regression • Linear regression assumes a linear relationship between two variables: • Dependent factor, y, and independent factor, x. • In a mathematical approach, this relationship can be described by the following linear equation: where a is called the slope and b is called the intercept. • This equation, which allows you to calculate y (dependent) based on x(independent), is based on the least square method. UC Merced

  22. Review - Math • Linear Equation • Slope and Intercept y = 3x + 8 3 8 UC Merced

  23. Y-values X-values Slope & Intercept formula Lengths of a leg bone (in cm) in penguin mating pairs UC Merced

  24. Predicted Y-values X-values X-value y = ax + b • a– slope & b - intercept B C 1 2 3 4 5 6 7 8 9 10 11 12 Don’t forget $ sign! =$C$10*B3+$C$11 UC Merced

  25. Plot a linear regression (or trend) line – Part 1 You can add a linear regression line UC Merced

  26. Don’t forget to check these two parts! Plot a linear regression (or trend) line –Part 2 • Right-click on any data point on the graph • Choose Add Trendline • Click on Options tab, and select Display equation and Display R-squared. • Click “Ok” UC Merced

  27. Plot a linear regression (or trend) line –Part 2 – cont. • R2 Value (R-squared value – RSQ) • “measure of scatter” • The closer this value comes to 1, the more accurate the prediction. UC Merced

  28. Let’s review the process! If there are some reasons to believe some causalities between two variables, then, plot a graph! Lengths of a leg bone (in cm) in penguin mating pairs • To see if two variables vary together Regression • To see how one variable affect another. UC Merced

  29. Any Questions? UC Merced

More Related