1 / 24

Data

Data. Freshman Clinic II. Overview. Populations and Samples Presentation Tables and Figures Central Tendency Variability Confidence Intervals Error Bars Student t test Linear Regression Applications. Populations and Samples. Population All possible data points Entire US population

cecelial
Download Presentation

Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Freshman Clinic II

  2. Overview • Populations and Samples • Presentation • Tables and Figures • Central Tendency • Variability • Confidence Intervals • Error Bars • Student t test • Linear Regression • Applications

  3. Populations and Samples • Population • All possible data points • Entire US population • Every rainfall event in Glassboro (past, present, and future) • Sample • Subset of population • We use samples to estimate population parameters

  4. Presentation • Present clearly, objectively • Properly communicate uncertainty • Compare using valid statistics

  5. Tables Table 1: Water Quality (average of 3 to 5 values)

  6. 20 11 10 7 5 1 Figures – Bar Chart 11 Figure 1: Average Turbidity of Pond Water, Treated and Untreated

  7. Figures – XY Scatter Figure 2: Change in Water Quality

  8. Central Tendency • Example: Turbidity of Treated Water (NTU) • Sample is 1, 3, 3, 6, 8, 10 n = 6 Mean = Sum of values divided by number of data points e.g., (1+3+3+6+8+10)/6 = 5.17 NTU Median = The middle number Rank - 1 2 3 4 5 6 Number - 1 3 3 6 8 10 (ordered) For even number of sample points, average middle two e.g., (3+6)/2 = 4.5 For odd number of sample points, median = middle point

  9. Variability • Standard deviation of a sample xi = ith data point = mean of sample n = number of data points e.g., [{(1-5.2)2+(3-5.2)2 +(3-5.2)2 +(6-5.2)2 +(8-5.2)2 +(10-5.2) 2}/(6-1)]0.5 = 3.43

  10. Where = sample mean, t = statistical parameter related to confidence, s = sample standard deviation, and n = sample size Confidence Interval of Mean • Estimated range within which population mean falls • e.g., 95% confidence interval of mean, based on our sample, is (1.57   8.77) where  = population mean • We are 95% confident true mean of population (from which our sample was drawn) lies within this range • Confidence interval (CI) calculated from sample:

  11. In Excel, type “=TINV” into a cell and select the “=“ symbol in the formula bar The student’s t-distribution inverse formula palette pops up “Probability” = 1 – confidence level (as a fraction) e.g., if confidence level is 95%, “probability” = 1 - 0.95 = 0.05 “Deg_freedom” = degrees of freedom = n - 1 TINV returns “t”, the statistical parameter we need to estimate a confidence interval based on a sample Calculating “t”

  12. Calculating a Confidence Interval • For our example: • “TINV” returned 2.57 • t x s / sqrt(n) = 2.57 x 3.43 / sqrt(6) = 3.60 • 5.17 – 3.60 = 1.57 • 5.17 + 3.60 = 8.77 • CI: (1.57   8.77) with 95% confidence • i.e., we are 95% confident the population mean lies between 1.57 and 8.77 • Quite Wide! • Lower “s” or higher “n” will narrow range

  13. Error Bars • Used to show data variability on a graph • Bar chart, XY,…

  14. Types of Error Bars • Standard Error of Mean • Confidence Interval • Standard Deviation • Percentage http://www.graphpad.com/articles/errorbars.htm Standard Error

  15. Create chart in Excel Select a data series by selecting a data point or bar From “Format” menu, select “Selected data series…” 5. Select + and – error bar data. This could be standard deviation, standard error, or confidence limits. 4. Select “custom” Adding Error Bars

  16. Error Bars and our Example • Standard Error of Mean • s / sqrt(n) = 3.43 / sqrt(6) = 1.40 • Put 1.40 in + and - cells • Since the mean = 5.17, the error bars in a bar chart would go from • 5.17 – 1.40 = 3.77 to • 5.17 + 1.40 = 6.57

  17. Interpreting Error Bars • Error bars can be used to compare two sample means • Standard Error (SE) • SE bars do not overlap, no conclusions can be drawn • SE bars overlap, sample appear to be not drawn from significantly different populations • Confidence Interval (CI) • CI bars do not overlap, samples appear to be drawn from significantly different populations, at confidence level of confidence interval • CI bars overlap, no conclusions can be drawn http://www.graphpad.com/articles/errorbars.htm

  18. Comparing Samples with a t-test • Example - You measure untreated and treated pond water • Treated: mean = 2 NTU, s = 0.5 NTU, n = 20 • Untreated: mean = 3 NTU, s = 0.6 NTU, n = 20 • You ask the question – Is the average turbidity of treated water different from that of untreated water? • Use a t-test

  19. Is the water different? • Use TTEST (Excel) • Probability (as fraction) of being wrong if you claim statistically significant difference (type I error) • Select significance level ahead of time, usually 0.01 - 0.1 • For our example, our #, 0.0000015, is very small

  20. T test steps • Identify two samples to compare • Select a , significance of statistical test • We’ll use 0.05 in this class • Confidence = 1 - a • Use Excel “TTEST” formula to estimate probability of Type I Error • If probability returned by TTEST is less than or equal to 0.05, assume the samples come from two different populations For our example, 0.0000015 < 0.05, assume the treated water is different from the untreated water

  21. Linear Regression • Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to show equation and R2.

  22. R2 - Coefficient of multiple Determination = Predicted y values, from regression equation = Average of y yi = Observed y values R2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line

  23. What might you do in this class? • Flow rate versus stroke rate • Figure with linear regression over linear range • Ability to improve water quality • Table and t-test comparison with untreated water (for turbidity and apparent color), or • Bar chart (for turbidity and apparent color) with confidence interval error bars • Pressure change versus flow rate, Power versus flowrate • Figure (no statistics possible because we only took one reading of pressure for each flow rate and relationship is non-linear) • Force versus stroke rate, • Figure w/95% confidence interval error bars for each data point • Power versus Flowrate • Figure

  24. Example – Water Quality Table 2: Improvement in Water Quality Note: Statistical significance tested at level = 0.05 using t-test

More Related