Statistics: Data Analysis and Presentation

1 / 22

# Statistics: Data Analysis and Presentation - PowerPoint PPT Presentation

Statistics: Data Analysis and Presentation. Fr Clinic II. Overview. Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars Comparing Means of Two Data Sets Linear Regression (LR). Warning.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Statistics: Data Analysis and Presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistics:Data Analysis and Presentation

Fr Clinic II

Overview
• Tables and Graphs
• Populations and Samples
• Mean, Median, and Standard Deviation
• Standard Error & 95% Confidence Interval (CI)
• Error Bars
• Comparing Means of Two Data Sets
• Linear Regression (LR)
Warning
• Statistics is a huge field, I’ve simplified considerably here. For example:
• Mean, Median, and Standard Deviation
• There are alternative formulas
• Standard Error and the 95% Confidence Interval
• There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…)
• Error Bars
• Don’t go beyond the interpretations I give here!
• Comparing Means of Two Data Sets
• We just cover the t test for two means when the variances are unknown but equal, there are other tests
• Linear Regression
• We only look at simple LR and only calculate the intercept, slope and R2. There is much more to LR!
Tables

Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters

4 5 12

Consistent Format, Title, Units, Big Fonts

20

11

10

7

5

1

Consistent Format, Title, Units

Good Axis Titles, Big Fonts

Figures

11

Figure 1: Turbidity of Pond Water, Treated and Untreated

Populations and Samples
• Population
• All of the possible outcomes of experiment or observation
• US population
• Particular type of steel beam
• Sample
• A finite number of outcomes measured or observations made
• 1000 US citizens
• 5 beams
• We use samples to estimate population properties
• Mean, Variability (e.g. standard deviation), Distribution
• Height of 1000 US citizens used to estimate mean of US population
Mean and Median
• Turbidity of Treated Water (NTU)

Mean = Sum of values divided by number of samples

= (1+3+3+6+8+10)/6

= 5.2 NTU

1

3

3

6

8

10

Median = The middle number

Rank - 1 2 3 4 5 6

Number - 1 3 3 6 8 10

For even number of sample points, average middle two

= (3+6)/2 = 4.5

Excel: Mean – AVERAGE; Median - MEDIAN

Variance
• Measure of variability
• sum of the square of the deviation about the mean divided by degrees of freedom

n = number of data points

Excel: variance – VAR

95%

-1.96

1.96

Standard Deviation, s
• Square-root of the variance
• For phenomena following a Normal Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of the mean
• Area under curve is probability of getting value within specified range

Excel: standard deviation – STDEV

Standard Deviations from Mean

Standard Error of Mean
• Standard error of mean
• Of sample of size n
• taken from population with standard deviation s
• Estimate of mean depends on sample selected
• As n , variance of mean estimate goes down, i.e., estimate of population mean improves
• As n , mean estimate distribution approaches normal, regardless of population distribution
95% Confidence Interval (CI) for Mean
• Interval within which we are 95 % confident the true mean lies
• t95%,n-1 is t-statistic for 95% CI if sample size = n
• If n  30, let t95%,n-1 = 1.96 (Normal Distribution)
• Otherwise, use Excel formula: TINV(0.05,n-1)
• n = number of data points
Error Bars
• Show data variability on plot of mean values
• Types of error bars include:
• ± Standard Deviation, ± Standard Error, ± 95% CI
• Maximum and minimum value
Using Error Bars to compare data
• Standard Deviation
• Demonstrates data variability, but no comparison possible
• Standard Error
• If bars overlap, any difference in means is not statistically significant
• If bars do not overlap, indicates nothing!
• 95% Confidence Interval
• If bars overlap, indicates nothing!
• If bars do not overlap, difference is statistically significant
• We’ll use 95 % CI
Example 1

Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.

What can we do?
• Plot mean water quality data for various filters with error bars
• Plot mean water quality over time with error bars
Comparing Filter Performance
• Use t test to determine if the mean of two populations are different.
• Based on two data sets
• E.g., turbidity produced by two different filters
Comparing Two Data Sets using the t test
• Example - You pump 20 gallons of water through filter 1 and 2. After every gallon, you measure the turbidity.
• Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20
• Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20
• You ask the question - Do the Filters make water with a different mean turbidity?
Do the Filters make different water?
• Use TTEST (Excel)
• Fractional probability of being wrong if you answer yes
• We want probability to be small  0.01 to 0.10 (1 to 10 %). Use 0.01
“t test” Questions
• Do two filters make different water?
• Take multiple measurements of a particular water quality parameter for 2 filters
• Do two filters treat difference amounts of water between cleanings?
• Measure amount of water filtered between cleanings for two filters
• Does the amount of water a filter treats between cleaning differ after a certain amount of water is treated?
• For a single filter, measure the amount of water treated between cleanings before and after a certain total amount of water is treated
Linear Regression
• Fit the best straight line to a data set

Right-click on data point and use “trendline” option. Use “options” tab to get equation and R2.

R2 - Coefficient of multiple Determination

ŷi = Predicted y values, from regression equation

yi = Observed y values

R2 = fraction of variance explained by regression (variance = standard deviation squared)

= 1 if data lies along a straight line