Simple Linear Regression and Correlation: Inferential Methods

Simple Linear Regression and Correlation: Inferential Methods Chapter 13 AP Statistics Peck, Olsen and Devore

Topic 2: Summary of Bivariate Data • In Topic 2 we discussed summarizing bivariate data • Specifically we were interested in summarizing linear relationships between two measurable characteristics • We summarized these linear relationships by performing a linear regression using the method of least squares

Least Squares Regression • Graphically display the data in a scatterplot • Form, strength and direction • Calculate the Pearson’s Correlation Coefficient • The strength of the linear association • Perform the least squares regression • Inspect the residual plot • Determine if the model is appropriate • No patterns • Determine the Coefficient of Determination • How good is the model as a prediction tool • Use the model as a prediction tool

Interpretation • Pearson’s correlation coefficient • Coefficient of Determination • Variables in • Standard deviation of the residuals

Minitab Output

Simple Linear Regression Model • ‘Simple’ because we had only one independent variable • We interpreted as a predicted value of y given a specific value of x • When we can describe this as a deterministic model. That is, the value of y is completely determined by a given value x • That wasn’t really the case when we used our linear regressions. The value of y was equal to our predicted value +/- some amount. That is, We call this a probabilistic model. • So, without e, the (x,y) pairs (observed points) would fall on the regression line.

Now consider this … • How did we calculate the coefficients in our linear regression models? • We were actually estimating a population parameter using a sample. That is, the simple linear regression is an estimate for the population regression line • We can consider estimates for

Basic Assumptions for the Simple Linear Regression Model • The distribution of e at any particular value of x has a mean value of 0. That is, • The standard deviation of e is the same for any value of x. Always denoted by • The distribution of e at any value of x is normal • The random deviations are independent.

Another interpretation of • Consider , where the coefficients are fixed and e is distributed normally. Then the sum of a fixed number and a normally distributed variable is normally distributed (Chapter 7). So y is normally distributed. • Now the mean of y will be equal to plus the mean of e which is equal to 0 • So another interpretation is the mean y value for a given x value =

Distribution of y • Where we can now see that y is distributed normally with a mean of • The variance for y is the same as the variance of e -- which is • An estimate for is

Assumption • The major assumption to all this is that the random deviation e is normally distributed. • We’ll talk more about how this assumption is reasonable later.

Inferences about the slope of the population regression line • Now we are going to make some inferences about the slope of the regression line. Specifically, we’ll construct a confidence interval and then perform a hypothesis test – a model utility test for simple linear regression

Just to repeat … • We said the population regression model is • The coefficients of this model are fixed but unknown (parameters) – so using the method of least squares, we estimate these parameters using a sample of data (statistics) and we get

Sampling distribution of b • We use b as an estimate for the population coefficient in the simple regression model • b is therefore a statistic determined by a random sample and it has a sampling distribution

Sampling distribution of b • When the four assumptions of the linear regression model are met • The mean value of the sampling distribution of b is . That is, • The standard deviation of the statistic b is • The sampling distribution of b is normally distributed.

Estimates for … • The estimate for the standard deviation of b is • When we standardize b it has a t distribution with n-2 degrees of freedom

Confidence Interval • Sample Statistic +/- Crit Value * Std Dev of Stat

Hypothesis Test • We’re normally interested in the null because if we reject the null, the data suggests there is a useful linear relationship between our two variables • We call this ‘Model Utility Test for Simple Linear Regression’

Summary of the Test • Test Statistic • Assumptions are the same four as those for the simple linear regression model.

Minitab Output

Simple Linear Regression and Correlation: Inferential Methods

Simple Linear Regression and Correlation: Inferential Methods

Presentation Transcript

Chapter 13

Regression for Data Mining

Introduction to Generalized Linear Models

Multiple Regression Analysis

This chapter uses MS Excel and Weka

Principles of Biostatistics Simple Linear Regression

Introduction to Linear Regression and Correlation Analysis

Canonical correlation

Descriptive vs. Inferential Statistics

Regression Analysis

3.3 Hypothesis Testing in Multiple Linear Regression

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Regression, correlation and liquid association in complex genomic data analysis

Inferential Statistics

Correlation and regression

Chapter 3

Applied Business Forecasting and planning

DIGITAL IMAGE CORRELATION

The General Linear Model

Chapter 10 Correlation and Regression

Regression