1 / 59

Statistics with Economics and Business Applications

luboslaw
Download Presentation

Statistics with Economics and Business Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Statistics with Economics and Business Applications Chapter 11 Linear Regression and Correlation Correlation Coefficient, Least Squares, ANOVA Table, Test and Confidence Intervals, Estimation and Prediction

    2. Review I. What’s in last two lectures? Hypotheses tests for means and proportions Chapter 8 II. What's in the next three lectures? Correlation coefficient and linear regression Read Chapter 11

    3. Introduction

    4. Example: Age and Fatness

    5. Example: Age and Fatness

    6. Example: Advertising and Sale The following table contains sales (y) and advertising expenditures (x) for 10 branches of a retail store.

    7. Describing the Scatterplot

    8. Examples

    9. Investigation of Relationship There are two approaches to investigate linear relationship Correlation coefficient: a numerical measure of the strength and direction of the linear relationship between x and y. Linear regression: a linear equation expresses the relationship between x and y. It provides a form of the relationship.

    10. The Correlation Coefficient The strength and direction of the relationship between x and y are measured using the correlation coefficient (Pearson product moment coefficient of correlation), r.

    11. Example The table shows the heights and weights of n = 10 randomly selected college football players.

    12. Football Players

    13. Interpreting r

    14. Example: Advertising and Sale Often we want to investigate the form of the relationship for the purpose of description, control and prediction. For the advertising and sale example, sale increases as advertising expenditure increase. The relationship is almost, but not exact linear. Description: how sales depends on advertising expenditure Control: how much to spend on advertising to reach certain goals on sales Prediction: how much sales do we expect if we spend certain amount of money on advertising

    15. Linear Deterministic Model Denote x as independent variables and y as dependent variable. For example, x=age and y=%fat in the first example; x=advertising expenditure, y=sales in the second example We want to find how y depends on x, or how to predict y using x One of the simplest deterministic mathematical relationship between two variables x and y is a linear relationship y = a + ßx

    16. Reality In the real world, things are never so clean! Age influences fatness. But it is not the sole influence. There are other factors such as sex, body type and random variation (e.g. measurement error) Other factors such as time of year, state of economy and size of inventory, besides the advertising expenditure, can influence the sale Observations of (x, y) do not fall on a straight line

    17. Probabilistic Model Probabilistic model: y = deterministic model + random error Random error represents random fluctuation from the deterministic model The probabilistic model is assumed for the population Simple linear regression model: y = a + ßx + e Without the random deviation e, all observed points (x, y) points would fall exactly on the deterministic line. The inclusion of e in the model equation allows points to deviate from the line by random amounts.

    18. Simple Linear Regression Model

    19. Basic Assumptions of the Simple Linear Regression Model The distribution of e at any particular x value has mean value 0. The standard deviation of e is the same for any particular value of x. This standard deviation is denoted by s. The distribution of e at any particular x value is normal. The random errors are independent of one another.

    20. Interpretation of Terms

    21. The Distribution of y For any fixed x, y has normal distribution with mean a + bx and standard deviation s.

    22. Data So far we have described the population probabilistic model. Usually three population parameters, a, ß and s, are unknown. We need to estimate them from data. Data: n pairs of observations of independent and dependent variables (x1,y1), (x2,y2), …, (xn,yn) 4. Probabilistic model yi = a + b xi + ei, i=1,…,n ei are independent normal with mean 0 and standard deviation s.

    23. Steps in Regression Analysis

    24. The Method of Least Squares The equation of the best-fitting line is calculated using n pairs of data (xi, yi).

    25. Least Squares Estimators

    26. Example: Age and Fatness

    27. Example: Age and Fatness

    28. Example: Age and Fatness

    29. The Analysis of Variance The total variation in the experiment is measured by the total sum of squares:

    30. The ANOVA Table Total df = Mean Squares Regression df = Error df =

    31. The F Test We can test the overall usefulness of the linear model using an F test. If the model is useful, MSR will be large compared to the unexplained variation, MSE.

    32. Coefficient of Determination r2 is the square of correlation coefficient r2 is a number between zero and one and a value close to zero suggests a poor model. It gives the proportion of variation in y that can be attributed to an approximate linear relationship between x and y. A very high value of r² can arise even though the relationship between the two variables is non-linear. The fit of a model should never simply be judged from the r² value alone.

    33. Estimate of s

    34. Example: Age and Fatness

    35. Example: Age and Fatness

    36. Inference Concerning the Slope ß

    37. Sampling Distribution

    38. Confidence Interval for b

    39. Example: Age and Fatness

    40. Hypothesis Tests Concerning b Step 1: Specify the null and alternative hypothesis H0: ß = ß0 versus Ha: ß ? ß0 (two-sided test) H0: ß = ß0 versus Ha: ß > ß0 (one-sided test) H0: ß = ß0 versus Ha: ß < ß0 (one-sided test) Step 2: Test statistic

    41. Hypothesis Tests Concerning b Step 3: Find p-value. Compute sample statistic

    42. Example: Age and Fatness

    43. Interpreting a Significant Regression

    44. Checking the Regression Assumptions

    45. Residuals

    46. Diagnostic Tools – Residual Plots

    47. Normal Probability Plot

    48. Residuals versus Fits

    49. Estimation and Prediction

    50. Estimation and Prediction

    51. Estimation and Prediction

    52. Estimation and Prediction

    53. Example: Age and Fatness

    54. Example: Age and Fatness

    55. Steps in Regression Analysis

    56. Minitab Output

    57. Key Concepts I. Scatter plot Pattern and unusual observations II. Correlation coefficient a numerical measure of the strength and direction. Interpretation of r

    58. Key Concepts III. A Linear Probabilistic Model 1. When the data exhibit a linear relationship, the appropriate model is y = a + b x + e . 2. The random error e has a normal distribution with mean 0 and variance s2. IV. Method of Least Squares 1. Estimates and , for a and b, are chosen to minimize SSE, the sum of the squared deviations about the regression line, 2. The least squares estimates are and

    59. Key Concepts V. Analysis of Variance 1. Total SS = SSR + SSE, where Total SS = Syy and SSR = (Sxy)2 / Sxx. 2. The best estimate of s 2 is MSE = SSE / (n - 2). VI. Testing, Estimation, and Prediction 1. A test for the significance of the linear regression—H0 : b = 0 can be implemented using statistic

    60. Key Concepts 2. The strength of the relationship between x and y can be measured using which gets closer to 1 as the relationship gets stronger. 3. Use residual plots to check for nonnormality, inequality of variances, and an incorrectly fit model. 4. Confidence intervals can be constructed to estimate the slope b of the regression line and to estimate the average value of y, for a given value of x. 5. Prediction intervals can be constructed to predict a particular observation, y, for a given value of x. For a given x, prediction intervals are always wider than confidence intervals.

More Related