Stats of Engineers, Lecture 8

Stats of Engineers, Lecture 8

Confidence interval widthA95% confidence interval for the mean resistance of a component was constructed using a random sample of size n = 10, giving . Which of the following conditions would NOT probably lead to a wider confidence interval? (bigger error bar)(can click more than one) • If the sample mean was larger • If you increased your confidence level • If you increased your sample size • If the population standard deviation was larger

Recap: Confidence Intervals for the mean Normal data, variance known or large data sample – use normal tables A confidence interval for if we measure a sample mean and already know is where Normal data, variance unknown – use t-distribution tables Q If is unknown, we need to make two changes: (i) Estimate by , the sample variance; (ii) replace z by , the value obtained from t-tables, The confidence interval for if we measure a sample mean and sample variance is: where . [d.o.f.]

Normal t-distribution For large the t-distribution tends to the Normal - in general broader tails

Linear regression We measure a response variable at various values of a controlled variable Linear regression: fitting a straight line to the mean value of as a function of

Equation of the fitted line is Least-squares estimates and : Sample means

Quantifying the goodness of the fit Estimating : variance of y about the fitted line Residual sum of squares

Predictions For given of interest, what is mean ? Predicted mean value: . What is the error bar? It can be shown that Confidence interval for mean y at given x

Example: The data y has been observed for various values of x, as follows: Fit the simple linear regression model using least squares.

Example: Using the previous data, what is the mean value of at and the 95% confidence interval? Recall fit was Confidence interval is , Need ⇒ . 95% confidence for Q=0.975 Hence confidence interval for mean is

Extrapolation: predictions outside the range of the original data What is the prediction for mean at ?

Extrapolation: predictions outside the range of the original data What is the prediction for mean at ? Looks OK!

Extrapolation: predictions outside the range of the original data What is the prediction for mean at ? Quite wrong! Extrapolation is often unreliable unless you are sure straight line is a good model

We previously calculated the confidence interval for the mean: if we average over many data samples of at , this tells us the interval we expect the average to lie in. What about the distribution of future data points themselves? Confidence interval for a prediction Two effects: - Variance on our estimate of mean at - Variance of individual points about the mean Confidence interval for a single response (measurement of at ) is Example: Using the previous data, what is the 95% confidence interval for a new measurement of at Answer

A linear regression line is fit to measured engine efficiency as a function of external temperature (in Celsius) at values . Which of the following statements is most likely to be incorrect? • The confidence interval for a new measurement of at is narrower than at • Adding a new data at would decrease the confidence interval width at • If and accurately have a linear regression model, adding more data points at and would be better than adding more at and • The mean engine efficiency at T= -20 will lie within the 95% confidence interval at T=-20 roughly 95% of the time

Answer Confidence interval for mean y at given x Confidence interval for a single response (measurement of at ) • Confidence interval narrower in the middle ( • Adding new data decreases uncertainty in fit, so confidence intervals narrower ( larger) • If linear regression model accurate, get better handle on the slope by adding data at both ends(bigger smaller confidence interval) • Extrapolation often unreliable – e.g. linear model may well not hold at below-freezing temperatures. Confidence interval unreliable at T=-20.

Correlation Regression tries to model the linear relation between mean y and x. Correlation measures the strength of the linear association between y and x. Weak correlation Strong correlation - same linear regression fit (with different confidence intervals)

If x and y are positivelycorrelated: - if x is high (y is mostly high () - if x is low () yis mostly low () on average is positive If x and y are negatively correlated: - if x is high (y is mostly low () - if x is low () y is mostly high () on average is negative can use to quantify the correlation

More convenient if the result is independent of units (dimensionless number). Define Pearson product-moment. If , then is unchanged ( Similarly for - stretching plot does not affect . Range : r = 1: there is a line with positive slope going through all the points; r = -1: there is a line with negative slope going through all the points; r = 0: there is no linear association between y and x.

Example: from the previous data: Hence Notes: - magnitude of r measures how noisy the data is, but not the slope - finding only means that there is no linear relationship, and does not imply the variables are independent

Question from Murphy et al. CorrelationA researcher found that r = +0.92 between the high temperature of the day and the number of ice cream cones sold in Brighton. What does this information tell us? • Higher temperatures cause people to buy more ice cream. • Buying ice cream causes the temperature to go up. • Some extraneous variable causes both high temperatures and high ice cream sales • Temperature and ice cream sales have a strong positive linear relationship.

Error on the estimated correlation coefficient? - not easy; possibilities include subdividing the points and assessing the spread in r values. Causation? does not imply that changes in xcause changes in y - additional types of evidence are needed to see if that is true. Correlation r error J Polit Econ. 2008; 116(3): 499–532.http://www.journals.uchicago.edu/doi/abs/10.1086/589524

Strong evidence for a 2-3% correlation. - this doesn’t mean being tall causes you earn more (though it could)

CorrelationWhich of the follow scatter plots shows data with the most negative correlation ? 2. • No correlation • Correct • Not large • positive 1. 3. 4.

Stats of Engineers, Lecture 8

Stats of Engineers, Lecture 8

Presentation Transcript

Stats for Engineers Lecture 11

Stats 330: Lecture 23

Stats 330: Lecture 26

STATS 330: Lecture 13

Stats for Engineers: Lecture 3

Stats 330: Lecture 22

Stats for Engineers Lecture 10

Stats for Engineers Lecture 7

Stats for Engineers Lecture 6

Stats for Engineers Lecture 9

Stats for Engineers Lecture 5

Stats for Engineers Lecture 7

STATS 330: Lecture 2

STATS 330: Lecture 1

Stats 330: Lecture 20

STATS 330: Lecture 7

Stats 330: Lecture 31

Stats 760: Lecture 2

Stats 330: Lecture 30

STATS 330: Lecture 8

Stats for Engineers: Lecture 4