# Stats of Engineers, Lecture 8 - PowerPoint PPT Presentation

1 / 25

Stats of Engineers, Lecture 8.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Stats of Engineers, Lecture 8

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

#### Presentation Transcript

Stats of Engineers, Lecture 8

Confidence interval widthA95% confidence interval for the mean resistance of a component was constructed using a random sample of size n = 10, giving . Which of the following conditions would NOT probably lead to a wider confidence interval? (bigger error bar)(can click more than one)

• If the sample mean was larger

• If you increased your confidence level

• If you increased your sample size

• If the population standard deviation was larger

Recap: Confidence Intervals for the mean

Normal data, variance known or large data sample – use normal tables

A confidence interval for if we measure a sample mean and already know is where

Normal data, variance unknown – use t-distribution tables

Q

If is unknown, we need to make two changes:

(i) Estimate by , the sample variance;

(ii) replace z by , the value obtained from t-tables,

The confidence interval for if we measure a sample mean and sample variance is: where . [d.o.f.]

Normal

t-distribution

For large the t-distribution tends to the Normal - in general broader tails

Linear regression

We measure a response variable at various values of a controlled variable

Linear regression: fitting a straight line to the mean value of as a function of

Equation of the fitted line is

Least-squares estimates and :

Sample means

Quantifying the goodness of the fit

Estimating : variance of y about the fitted line

Residual sum of squares

Predictions

For given of interest, what is mean ?

Predicted mean value: .

What is the error bar?

It can be shown that

Confidence interval for mean y at given x

Example:

The data y has been observed for various values of x, as follows:

Fit the simple linear regression model using least squares.

Example: Using the previous data, what is the mean value of at and the 95% confidence interval?

Recall fit was

Confidence interval is ,

Need

⇒ .

95% confidence for Q=0.975

Hence confidence interval for mean is

Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?

Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?

Looks OK!

Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?

Quite wrong!

Extrapolation is often unreliable unless you are sure straight line is a good model

We previously calculated the confidence interval for the mean: if we average over many data samples of at , this tells us the interval we expect the average to lie in.

What about the distribution of future data points themselves?

Confidence interval for a prediction

Two effects:

- Variance on our estimate of mean at

- Variance of individual points about the mean

Confidence interval for a single response (measurement of at ) is

Example: Using the previous data, what is the 95% confidence interval for a new measurement of at

A linear regression line is fit to measured engine efficiency as a function of external temperature (in Celsius) at values . Which of the following statements is most likely to be incorrect?

• The confidence interval for a new measurement of at is narrower than at

• Adding a new data at would decrease the confidence interval width at

• If and accurately have a linear regression model, adding more data points at and would be better than adding more at and

• The mean engine efficiency at T= -20 will lie within the 95% confidence interval at T=-20 roughly 95% of the time

Confidence interval for mean y at given x

Confidence interval for a single response

(measurement of at )

• Confidence interval narrower in the middle (

• Adding new data decreases uncertainty in fit, so confidence intervals narrower ( larger)

• If linear regression model accurate, get better handle on the slope by adding data at both ends(bigger smaller confidence interval)

• Extrapolation often unreliable – e.g. linear model may well not hold at below-freezing temperatures. Confidence interval unreliable at T=-20.

Correlation

Regression tries to model the linear relation between mean y and x.

Correlation measures the strength of the linear association between y and x.

Weak correlation

Strong correlation

- same linear regression fit (with different confidence intervals)

If x and y are positivelycorrelated:

- if x is high (y is mostly high ()

- if x is low () yis mostly low ()

on average is positive

If x and y are negatively correlated:

- if x is high (y is mostly low ()

- if x is low () y is mostly high ()

on average is negative

can use to quantify the correlation

More convenient if the result is independent of units (dimensionless number).

Define

Pearson product-moment.

If , then is unchanged (

Similarly for - stretching plot does not affect .

Range :

r = 1: there is a line with positive slope going through all the points;

r = -1: there is a line with negative slope going through all the points;

r = 0: there is no linear association between y and x.

Example: from the previous data:

Hence

Notes:

- magnitude of r measures how noisy the data is, but not the slope

- finding only means that there is no linear relationship, and does not imply the variables are independent

Question from Murphy et al.

CorrelationA researcher found that r = +0.92 between the high temperature of the day and the number of ice cream cones sold in Brighton. What does this information tell us?

• Higher temperatures cause people to buy more ice cream.

• Buying ice cream causes the temperature to go up.

• Some extraneous variable causes both high temperatures and high ice cream sales

• Temperature and ice cream sales have a strong positive linear relationship.

Error on the estimated correlation coefficient?

- not easy; possibilities include subdividing the points and assessing the spread in r values.

Causation? does not imply that changes in xcause changes in y - additional types of evidence are needed to see if that is true.

Correlation r

error

J Polit Econ. 2008; 116(3): 499–532.http://www.journals.uchicago.edu/doi/abs/10.1086/589524

Strong evidence for a 2-3% correlation.

- this doesn’t mean being tall causes you earn more (though it could)

### CorrelationWhich of the follow scatter plots shows data with the most negative correlation ?

2.

• No correlation

• Correct

• Not large

• positive

1.

3.

4.