Statistical Methods For UO Lab — Part 1

1 / 25

Statistical Methods For UO Lab — Part 1 - PowerPoint PPT Presentation

Statistical Methods For UO Lab — Part 1. Calvin H. Bartholomew Chemical Engineering Brigham Young University. Background. Statistics is the science of problem-solving in the presence of variability (Mason 2003). Statistics enables us to: Assess the variability of measurements

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Statistical Methods For UO Lab — Part 1' - kato-blanchard

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Statistical Methods For UO Lab — Part 1

Calvin H. Bartholomew

Chemical Engineering

Brigham Young University

Background
• Statistics is the science of problem-solving in the presence of variability (Mason 2003).
• Statistics enables us to:
• Assess the variability of measurements
• Avoid bias from unconsidered causes variation
• Determine probability of factors, risks
• Build good models
• Obtain best estimates of model parameters
• Improve chances of making correct decisions
• Make most efficient and effective use of resources
Some U.S. Cultural Statistics
• 58.4% have called into work sick when we weren't.
• 3 out of 4 of us store our dollar bills in rigid order with singles leading up to higher denominations.
• 50% admit they regularly sneak food into movie theaters to avoid the high prices of snack foods.
• 39% of us peek in our host's bathroom cabinet.
• 17% have been caught by the host.
• 81.3% would tell an acquaintance to zip his pants.
• 29% of us ignore RSVP.
• 35% give to charity at least once a month.
• 71.6% of us eavesdrop.
Population statistics

Characterizes the entire population, which is generally the unknown information we seek

Mean generally designated m

Variance & standard deviation generally designated as s 2, and s, respectively

Sample statistics

Characterizes a random, hopefully representative, sample – typically data from which we infer population statistics

Mean generally designated

Variance & standard deviation generally designated as s2 and s, respectively

Population vs. Sample Statistics
Point estimation

Characterizes a single, usually global measurement

Generally simple mathematic and statistical analysis

Procedures are unambiguous

Model development

Characterizes a function of dependent variables

Complexity of parameter estimation and statistical analysis depend on model complexity

Parameter estimation and especially statistics are somewhat ambiguous

Point vs. Model Estimation
Overall Approach
• Use sample statistics to estimate population statistics
• Use statistical theory to indicate the accuracy with which the population statistics have been estimated
• Use linear or nonlinear regression methods/statistics to fit data to a model and to determine goodness of fit
• Use trends indicated by theory to optimize experimental design
Sample Statistics
• Estimate properties of probability distribution function (PDF), i.e., mean and standard deviation using Gaussian statistics
• Use student t-test to determine variance and confidence interval
• Estimate random errors in the measurement of data
• For variables that are geometric functions of several basic variables, use the propagation of errors approach estimate: (a) probable error (PE) and (b) maximum possible error (MPE)
• PE and MPE can be estimated by differential method; MPE can also be estimated by brute force method
• Determine systematic errors (bias)
• Compare estimated errors from measurements with calculated errors from statistics—will reveal whether methods of measurement or quantity of data is limiting

Some definitions:

x = sample mean

s = sample standard deviation

m = exact mean

s = exact standard deviation

As the sampling becomes larger:

x  m s  st chart z chart

not valid if bias exists (i.e. calibration is off)

Random Error: Single Variable (i.e. T)

Questions

• Several measurements
• are obtained for a
• single variable (i.e. T).
• What is the true value?
• How confident are you?
• Is the value different on
• different days?

small

large

(n>30)

How do you determine bounds of m?

• Let’s assume a “normal” Gaussian distribution
• For small sample: s is known
• For large sample: s is assumed

we’ll pursue this approach

Use z tables for this approach

Properties of a Normal PDF
• About 68.26%, 95.44%, and 99.74% of data lie within 1, 2, and 3 standard deviations of the mean, respectively.
• When mean is zero and standard deviation is 1, it is referred to as a standard normal distribution.
• Plays fundamental role in statistical analysis because of the Central Limit Theorem.
Central Limit Theorem
• Distribution of means calculated from a large data set is approximately normal
• Becomes more accurate with larger number of samples
• Sample mean approaches true mean as n → 
• Assumes distributions are not peaked close to a boundary and variances are finite
Student t-Distribution
• Widely used in hypothesis testing and determining confidence intervals
• Equivalent to normal distribution for large sample size
• Student is a pseudonym, not an adjective – actual name was W. S. Gosset who published in early 1900s.
Student t-Distribution
• Used to compute confidence intervals according to
• Assumes mean and variance are estimated by sample values
• Value of t decreases with DOF or number of data points n; increases with increasing % confidence

Student t-test (determine error from s)

5%

5%

t

a = 1- probability

r = n -1

error = ts/n0.5

e.g. From Example 1: n = 7, s = 3.27

Values of Student t Distribution
• Depend on both confidence level desired and amount of data.
• Degrees of freedom are n-1, where n = number of data points (assumes mean and variance are estimated from data).
• This table assumes two-tailed distribution of area.
Example 2
• Five data points with sample mean and standard deviation of 713.6 and 107.8, respectively.
• The estimated population mean and 95% confidence interval is (from previous table ta = 2.77645):

Example 3: Comparing Averages

Day 1:

Day 2:

What is your confidence that mx≠my?

99% confident different

1% confident same

nx+ny-2

Error Propagation: Multiple Variables

Obtain value (i.e. from model) using multiple input variables.

What is the uncertainty of your value?

Each input variable has its own error

Example: How much ice cream do you buy for

the AIChE event? Ice cream = f (time of day, tests, …)

Example: You take measurements of r, A, v

to determine m = rAv. What is the

range of m and its associated uncertainty?

Value and Uncertainty

• Values are used to make decisions by managers — uncertainty of a value must be specified
• Ethics and societal impact of values are important
• How do you determine the uncertainty of a value?
• Sources of uncertainty:
• Estimation- we guess!
• Discrimination- device accuracy (single data point)
• Calibration- may not be exact (error of curve fit)
• Technique- i.e. measure ID rather than OD
• Constants and data- not always exact!
• Noise- which reading do we take?
• Model and equations- i.e. ideal gas law vs real gas
• Humans- transposing, …

Estimates of Error (d) for Input Variable

(Methods or rules)

• Measured variable (as we just did): measure multiple times; obtain s;
• d≈ 2.57s (t chart shows > 2.57s for 99% confidence
• e.g. s = 2.3 ºC for thermocouple, d= 5.8 ºC2.Tabulated variable:d ≈ 2.57 times last reported significant digit (e.g. r = 1.0 g/ml at 0º C, d = 0.257 g/ml)

Estimates of Error (d) for Variable

• Manufacturer specs: use given accuracy data (ex. Pump is ± 1 ml/min, d = 1 ml/min)
• Variable from regression (i.e. calibration curve):d≈ standard error (e.g. Velocity from equation with std error = 2 m/s )
• Judgment for a variable: use judgment for d (e.g. graph gives pressure to ± 1 psi, d= 1 psi)

Calculating Maximum or Probable Error

• Maximum error can be calculated as shown previously:
• Brute force method
• Differential method
• Probable error is more realistic – positive and negative errors can lower the error. You need standard deviations (s or s) to calculate probable error (PE) (i.e. see previous example). PE =d= 2.57 s

Ψ = y ± 1.96 SQRT(s2y) 95%

Ψ = y ± 2.57 SQRT(s2y) 99%

Calculating Maximum (Worst) Error

1.Brute force method: substitute upper and lower limits of all x’s into function to get max and min values of y. Range of y (Ψ ) is between ymin and ymax.

2.Differential method: from a given model

y = f(a,b,c…, x1,x2,x3,…)

Exact constants

Independent variables

Range of y (Ψ) = y ± dy

Example 4: Differential method

m = rA v

y x1 x2 x3

x1 = r= 2.0 g/cm3 (table)

x2 = A = 3.4 cm2 (measured avg)

x3 = v = 2 cm/s (calibration)

d1 = 0.257 g/cm3 (Rule 2)

d2 = 0.2 cm2 (Rule 1)

d3 = 0.1 cm/s (Rule 4)

Ψ = 13.6 ± 3.2 g/s

y = (2.0)(3.4)(2) = 13.6 g/s

dy = (6.8)(0.257)+(4.0)(0.2)+(6.8)(0.1) = 3.2 g/s

Which product term contributes the most to uncertainty?

This method works only if errors are symmetrical