Statistics with Economics and Business Applications

1. Statistics with Economics and Business Applications Chapter 11 Linear Regression and Correlation Correlation Coefficient, Least Squares, ANOVA Table, Test and Confidence Intervals, Estimation and Prediction

2. Review I. What�s in last two lectures? Hypotheses tests for means and proportions Chapter 8 II. What's in the next three lectures? Correlation coefficient and linear regression Read Chapter 11

3. Introduction

4. Example: Age and Fatness

5. Example: Age and Fatness

6. Example: Advertising and Sale The following table contains sales (y) and advertising expenditures (x) for 10 branches of a retail store.

7. Describing the Scatterplot

8. Examples

9. Investigation of Relationship There are two approaches to investigate linear relationship Correlation coefficient: a numerical measure of the strength and direction of the linear relationship between x and y. Linear regression: a linear equation expresses the relationship between x and y. It provides a form of the relationship.

10. The Correlation Coefficient The strength and direction of the relationship between x and y are measured using the correlation coefficient (Pearson product moment coefficient of correlation), r.

11. Example The table shows the heights and weights of n = 10 randomly selected college football players.

12. Football Players

13. Interpreting r

14. Example: Advertising and Sale Often we want to investigate the form of the relationship for the purpose of description, control and prediction. For the advertising and sale example, sale increases as advertising expenditure increase. The relationship is almost, but not exact linear. Description: how sales depends on advertising expenditure Control: how much to spend on advertising to reach certain goals on sales Prediction: how much sales do we expect if we spend certain amount of money on advertising

15. Linear Deterministic Model Denote x as independent variables and y as dependent variable. For example, x=age and y=%fat in the first example; x=advertising expenditure, y=sales in the second example We want to find how y depends on x, or how to predict y using x One of the simplest deterministic mathematical relationship between two variables x and y is a linear relationship y = a + �x

16. Reality In the real world, things are never so clean! Age influences fatness. But it is not the sole influence. There are other factors such as sex, body type and random variation (e.g. measurement error) Other factors such as time of year, state of economy and size of inventory, besides the advertising expenditure, can influence the sale Observations of (x, y) do not fall on a straight line

17. Probabilistic Model Probabilistic model: y = deterministic model + random error Random error represents random fluctuation from the deterministic model The probabilistic model is assumed for the population Simple linear regression model: y = a + �x + e Without the random deviation e, all observed points (x, y) points would fall exactly on the deterministic line. The inclusion of e in the model equation allows points to deviate from the line by random amounts.

18. Simple Linear Regression Model

19. Basic Assumptions of the Simple Linear Regression Model The distribution of e at any particular x value has mean value 0. The standard deviation of e is the same for any particular value of x. This standard deviation is denoted by s. The distribution of e at any particular x value is normal. The random errors are independent of one another.

20. Interpretation of Terms

21. The Distribution of y For any fixed x, y has normal distribution with mean a + bx and standard deviation s.

22. Data So far we have described the population probabilistic model. Usually three population parameters, a, � and s, are unknown. We need to estimate them from data. Data: n pairs of observations of independent and dependent variables (x1,y1), (x2,y2), �, (xn,yn) 4. Probabilistic model yi = a + b xi + ei, i=1,�,n ei are independent normal with mean 0 and standard deviation s.

23. Steps in Regression Analysis

24. The Method of Least Squares The equation of the best-fitting line is calculated using n pairs of data (xi, yi).

25. Least Squares Estimators

26. Example: Age and Fatness



29. The Analysis of Variance The total variation in the experiment is measured by the total sum of squares:

30. The ANOVA Table Total df = Mean Squares Regression df = Error df =

31. The F Test We can test the overall usefulness of the linear model using an F test. If the model is useful, MSR will be large compared to the unexplained variation, MSE.

32. Coefficient of Determination r2 is the square of correlation coefficient r2 is a number between zero and one and a value close to zero suggests a poor model. It gives the proportion of variation in y that can be attributed to an approximate linear relationship between x and y. A very high value of r� can arise even though the relationship between the two variables is non-linear. The fit of a model should never simply be judged from the r� value alone.

33. Estimate of s



36. Inference Concerning the Slope �

37. Sampling Distribution

38. Confidence Interval for b


40. Hypothesis Tests Concerning b Step 1: Specify the null and alternative hypothesis H0: � = �0 versus Ha: � ? �0 (two-sided test) H0: � = �0 versus Ha: � > �0 (one-sided test) H0: � = �0 versus Ha: � < �0 (one-sided test) Step 2: Test statistic

41. Hypothesis Tests Concerning b Step 3: Find p-value. Compute sample statistic


43. Interpreting a Significant Regression

44. Checking the Regression Assumptions

45. Residuals

46. Diagnostic Tools � Residual Plots

47. Normal Probability Plot

48. Residuals versus Fits

49. Estimation and Prediction






55. Steps in Regression Analysis

56. Minitab Output

57. Key Concepts I. Scatter plot Pattern and unusual observations II. Correlation coefficient a numerical measure of the strength and direction. Interpretation of r

58. Key Concepts III. A Linear Probabilistic Model 1. When the data exhibit a linear relationship, the appropriate model is y = a + b x + e . 2. The random error e has a normal distribution with mean 0 and variance s2. IV. Method of Least Squares 1. Estimates and , for a and b, are chosen to minimize SSE, the sum of the squared deviations about the regression line, 2. The least squares estimates are and

59. Key Concepts V. Analysis of Variance 1. Total SS = SSR + SSE, where Total SS = Syy and SSR = (Sxy)2 / Sxx. 2. The best estimate of s 2 is MSE = SSE / (n - 2). VI. Testing, Estimation, and Prediction 1. A test for the significance of the linear regression�H0 : b = 0 can be implemented using statistic

60. Key Concepts 2. The strength of the relationship between x and y can be measured using which gets closer to 1 as the relationship gets stronger. 3. Use residual plots to check for nonnormality, inequality of variances, and an incorrectly fit model. 4. Confidence intervals can be constructed to estimate the slope b of the regression line and to estimate the average value of y, for a given value of x. 5. Prediction intervals can be constructed to predict a particular observation, y, for a given value of x. For a given x, prediction intervals are always wider than confidence intervals.

Statistics with Economics and Business Applications

Statistics with Economics and Business Applications

Presentation Transcript

Welcome to Pstat5E: Statistics with Economics and Business Applications

Statistics with Economics and Business Applications

Statistics for Business and Economics

Statistics with Economics and Business Applications

Statistics with Economics and Business Applications

Statistics with Economics and Business Applications

Statistics for Business and Economics

Statistics with Economics and Business Applications

OAD30763 Statistics in Business and Economics

Applied Statistics For Business and Economics

Econ 3790: Business and Economics Statistics

Welcome to Pstat5E: Statistics with Economics and Business Applications

Statistics for Business and Economics

Statistics for Business and Economics

Statistics with Economics and Business Applications

Statistics for Business and Economics

Econ 3790: Business and Economics Statistics

Economics 173 Business Statistics

Statistics with Economics and Business Applications

Econ 3790: Business and Economics Statistics

Economics 173 Business Statistics