Stats of Engineers, Lecture 8
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Stats of Engineers, Lecture 8 PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Stats of Engineers, Lecture 8.

Download Presentation

Stats of Engineers, Lecture 8

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Stats of engineers lecture 8

Stats of Engineers, Lecture 8


Stats of engineers lecture 8

Confidence interval widthA95% confidence interval for the mean resistance of a component was constructed using a random sample of size n = 10, giving . Which of the following conditions would NOT probably lead to a wider confidence interval? (bigger error bar)(can click more than one)

  • If the sample mean was larger

  • If you increased your confidence level

  • If you increased your sample size

  • If the population standard deviation was larger


Stats of engineers lecture 8

Recap: Confidence Intervals for the mean

Normal data, variance known or large data sample – use normal tables

A confidence interval for if we measure a sample mean and already know is where

Normal data, variance unknown – use t-distribution tables

Q

If is unknown, we need to make two changes:

(i) Estimate by , the sample variance; 

(ii) replace z by , the value obtained from t-tables,

The confidence interval for if we measure a sample mean and sample variance is: where . [d.o.f.]


Stats of engineers lecture 8

Normal

t-distribution

For large the t-distribution tends to the Normal - in general broader tails


Stats of engineers lecture 8

Linear regression

We measure a response variable at various values of a controlled variable

Linear regression: fitting a straight line to the mean value of as a function of


Stats of engineers lecture 8

Equation of the fitted line is

Least-squares estimates and :

Sample means


Stats of engineers lecture 8

Quantifying the goodness of the fit

Estimating : variance of y about the fitted line

Residual sum of squares


Stats of engineers lecture 8

Predictions

For given of interest, what is mean ?

Predicted mean value: .

What is the error bar?

It can be shown that

Confidence interval for mean y at given x


Stats of engineers lecture 8

Example:

The data y has been observed for various values of x, as follows:

Fit the simple linear regression model using least squares.


Stats of engineers lecture 8

Example: Using the previous data, what is the mean value of at and the 95% confidence interval?

Recall fit was

Confidence interval is ,

Need

⇒ .

95% confidence for Q=0.975

Hence confidence interval for mean is


Stats of engineers lecture 8

Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?


Stats of engineers lecture 8

Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?

Looks OK!


Stats of engineers lecture 8

Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?

Quite wrong!

Extrapolation is often unreliable unless you are sure straight line is a good model


Stats of engineers lecture 8

We previously calculated the confidence interval for the mean: if we average over many data samples of at , this tells us the interval we expect the average to lie in.

What about the distribution of future data points themselves?

Confidence interval for a prediction

Two effects:

- Variance on our estimate of mean at

- Variance of individual points about the mean

Confidence interval for a single response (measurement of at ) is  

Example: Using the previous data, what is the 95% confidence interval for a new measurement of at

Answer


Stats of engineers lecture 8

A linear regression line is fit to measured engine efficiency as a function of external temperature (in Celsius) at values . Which of the following statements is most likely to be incorrect?

  • The confidence interval for a new measurement of at is narrower than at

  • Adding a new data at would decrease the confidence interval width at

  • If and accurately have a linear regression model, adding more data points at and would be better than adding more at and

  • The mean engine efficiency at T= -20 will lie within the 95% confidence interval at T=-20 roughly 95% of the time


Stats of engineers lecture 8

Answer

Confidence interval for mean y at given x

Confidence interval for a single response

(measurement of at )

  • Confidence interval narrower in the middle (

  • Adding new data decreases uncertainty in fit, so confidence intervals narrower ( larger)

  • If linear regression model accurate, get better handle on the slope by adding data at both ends(bigger smaller confidence interval)

  • Extrapolation often unreliable – e.g. linear model may well not hold at below-freezing temperatures. Confidence interval unreliable at T=-20.


Stats of engineers lecture 8

Correlation

Regression tries to model the linear relation between mean y and x.

Correlation measures the strength of the linear association between y and x.

Weak correlation

Strong correlation

- same linear regression fit (with different confidence intervals)


Stats of engineers lecture 8

If x and y are positivelycorrelated:

- if x is high (y is mostly high ()

- if x is low () yis mostly low ()

on average is positive

If x and y are negatively correlated:

- if x is high (y is mostly low ()

- if x is low () y is mostly high ()

on average is negative

can use to quantify the correlation


Stats of engineers lecture 8

More convenient if the result is independent of units (dimensionless number).

Define

Pearson product-moment.

If , then is unchanged (

Similarly for - stretching plot does not affect .

Range :

r = 1: there is a line with positive slope going through all the points;

r = -1: there is a line with negative slope going through all the points;

r = 0: there is no linear association between y and x.


Stats of engineers lecture 8

Example: from the previous data:

Hence

Notes:

- magnitude of r measures how noisy the data is, but not the slope

- finding only means that there is no linear relationship, and does not imply the variables are independent


Stats of engineers lecture 8

Question from Murphy et al.

CorrelationA researcher found that r = +0.92 between the high temperature of the day and the number of ice cream cones sold in Brighton. What does this information tell us?

  • Higher temperatures cause people to buy more ice cream.

  • Buying ice cream causes the temperature to go up.

  • Some extraneous variable causes both high temperatures and high ice cream sales

  • Temperature and ice cream sales have a strong positive linear relationship.


Stats of engineers lecture 8

Error on the estimated correlation coefficient?

- not easy; possibilities include subdividing the points and assessing the spread in r values.

Causation? does not imply that changes in xcause changes in y - additional types of evidence are needed to see if that is true.

Correlation r

error

J Polit Econ. 2008; 116(3): 499–532.http://www.journals.uchicago.edu/doi/abs/10.1086/589524


Stats of engineers lecture 8

Strong evidence for a 2-3% correlation.

- this doesn’t mean being tall causes you earn more (though it could)


Correlation which of the follow scatter plots shows data with the most negative correlation

CorrelationWhich of the follow scatter plots shows data with the most negative correlation ?

2.

  • No correlation

  • Correct

  • Not large

  • positive

1.

3.

4.


  • Login