Stats of Engineers, Lecture 8
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Stats of Engineers, Lecture 8 PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

Stats of Engineers, Lecture 8.

Download Presentation

Stats of Engineers, Lecture 8

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Stats of Engineers, Lecture 8


Confidence interval widthA95% confidence interval for the mean resistance of a component was constructed using a random sample of size n = 10, giving . Which of the following conditions would NOT probably lead to a wider confidence interval? (bigger error bar)(can click more than one)

  • If the sample mean was larger

  • If you increased your confidence level

  • If you increased your sample size

  • If the population standard deviation was larger


Recap: Confidence Intervals for the mean

Normal data, variance known or large data sample – use normal tables

A confidence interval for if we measure a sample mean and already know is where

Normal data, variance unknown – use t-distribution tables

Q

If is unknown, we need to make two changes:

(i) Estimate by , the sample variance; 

(ii) replace z by , the value obtained from t-tables,

The confidence interval for if we measure a sample mean and sample variance is: where . [d.o.f.]


Normal

t-distribution

For large the t-distribution tends to the Normal - in general broader tails


Linear regression

We measure a response variable at various values of a controlled variable

Linear regression: fitting a straight line to the mean value of as a function of


Equation of the fitted line is

Least-squares estimates and :

Sample means


Quantifying the goodness of the fit

Estimating : variance of y about the fitted line

Residual sum of squares


Predictions

For given of interest, what is mean ?

Predicted mean value: .

What is the error bar?

It can be shown that

Confidence interval for mean y at given x


Example:

The data y has been observed for various values of x, as follows:

Fit the simple linear regression model using least squares.


Example: Using the previous data, what is the mean value of at and the 95% confidence interval?

Recall fit was

Confidence interval is ,

Need

⇒ .

95% confidence for Q=0.975

Hence confidence interval for mean is


Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?


Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?

Looks OK!


Extrapolation: predictions outside the range of the original data

What is the prediction for mean at ?

Quite wrong!

Extrapolation is often unreliable unless you are sure straight line is a good model


We previously calculated the confidence interval for the mean: if we average over many data samples of at , this tells us the interval we expect the average to lie in.

What about the distribution of future data points themselves?

Confidence interval for a prediction

Two effects:

- Variance on our estimate of mean at

- Variance of individual points about the mean

Confidence interval for a single response (measurement of at ) is  

Example: Using the previous data, what is the 95% confidence interval for a new measurement of at

Answer


A linear regression line is fit to measured engine efficiency as a function of external temperature (in Celsius) at values . Which of the following statements is most likely to be incorrect?

  • The confidence interval for a new measurement of at is narrower than at

  • Adding a new data at would decrease the confidence interval width at

  • If and accurately have a linear regression model, adding more data points at and would be better than adding more at and

  • The mean engine efficiency at T= -20 will lie within the 95% confidence interval at T=-20 roughly 95% of the time


Answer

Confidence interval for mean y at given x

Confidence interval for a single response

(measurement of at )

  • Confidence interval narrower in the middle (

  • Adding new data decreases uncertainty in fit, so confidence intervals narrower ( larger)

  • If linear regression model accurate, get better handle on the slope by adding data at both ends(bigger smaller confidence interval)

  • Extrapolation often unreliable – e.g. linear model may well not hold at below-freezing temperatures. Confidence interval unreliable at T=-20.


Correlation

Regression tries to model the linear relation between mean y and x.

Correlation measures the strength of the linear association between y and x.

Weak correlation

Strong correlation

- same linear regression fit (with different confidence intervals)


If x and y are positivelycorrelated:

- if x is high (y is mostly high ()

- if x is low () yis mostly low ()

on average is positive

If x and y are negatively correlated:

- if x is high (y is mostly low ()

- if x is low () y is mostly high ()

on average is negative

can use to quantify the correlation


More convenient if the result is independent of units (dimensionless number).

Define

Pearson product-moment.

If , then is unchanged (

Similarly for - stretching plot does not affect .

Range :

r = 1: there is a line with positive slope going through all the points;

r = -1: there is a line with negative slope going through all the points;

r = 0: there is no linear association between y and x.


Example: from the previous data:

Hence

Notes:

- magnitude of r measures how noisy the data is, but not the slope

- finding only means that there is no linear relationship, and does not imply the variables are independent


Question from Murphy et al.

CorrelationA researcher found that r = +0.92 between the high temperature of the day and the number of ice cream cones sold in Brighton. What does this information tell us?

  • Higher temperatures cause people to buy more ice cream.

  • Buying ice cream causes the temperature to go up.

  • Some extraneous variable causes both high temperatures and high ice cream sales

  • Temperature and ice cream sales have a strong positive linear relationship.


Error on the estimated correlation coefficient?

- not easy; possibilities include subdividing the points and assessing the spread in r values.

Causation? does not imply that changes in xcause changes in y - additional types of evidence are needed to see if that is true.

Correlation r

error

J Polit Econ. 2008; 116(3): 499–532.http://www.journals.uchicago.edu/doi/abs/10.1086/589524


Strong evidence for a 2-3% correlation.

- this doesn’t mean being tall causes you earn more (though it could)


CorrelationWhich of the follow scatter plots shows data with the most negative correlation ?

2.

  • No correlation

  • Correct

  • Not large

  • positive

1.

3.

4.


  • Login