 Download Download Presentation Session 1

# Session 1

Download Presentation ## Session 1

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Session 1

2. Outline for Session 1 • Course Objectives & Description • Review of Basic Statistical Ideas • Intercept, Slope, Correlation, Causality • Simple Linear Regression • Statistical Model and Concepts • Regression in Excel Applied Regression -- Prof. Juran

3. Course Themes • Learn useful and practical tools of regression and data analysis • Learn by example and by doing • Learn enough theory to use regression safely Applied Regression -- Prof. Juran

4. Shape the course experience to meet your goals • The agenda is flexible • Pick your own project • The professor also enjoys learning • Let’s enjoy ourselves – life is too short Applied Regression -- Prof. Juran

5. Basic Information www.columbia.edu/~dj114/b8114.htm Teaching Assistant: • DavideCrapis Applied Regression -- Prof. Juran

6. Basic Requirements • Come to class and participate • Cases about once per week • Project Applied Regression -- Prof. Juran

7. What is Regression Analysis? • A Procedure for Data Analysis • Regression analysis is a family of mathematical procedures for fitting functions to data. • The most basic procedure -- simple linear regression -- fits a straight line to a set of data so that the sum of the squared “y deviations” is minimal. Regression can be used on a completely pragmatic basis. Applied Regression -- Prof. Juran

8. What is Regression Analysis? • A Foundation for Statistical Inference • If special statistical conditions hold, the regression analysis: • Produces statistically “best” estimates of the “true” underlying relationship and its components • Provides measures of the quality and reliability of the fitted function • Provides the basis for hypothesis tests and confidence and prediction intervals Applied Regression -- Prof. Juran

9. Some Regression Applications • Determining the factors that influence energy consumption in a detergent plant • Measuring the volatility of financial securities • Determining the influence of ambient launch temperature on Space Shuttle o-ring burn through. • Identifying demographic and purchase history factors that predict high consumer response to catalog mailings • Mounting a legal defense against a charge of sex discrimination in pay. • Determining the cause of leaking antifreeze bottles on a packing line. • Measuring the fairness of CEO compensation • Predicting monthly champagne sales Applied Regression -- Prof. Juran

10. Course Outline • Basics of regression • Bottom: inferences about effects of independent variables on the dependent variable • Middle: Analysis of Variance • Top: summary measures for the model Applied Regression -- Prof. Juran

11. Course Outline • Advanced Regression Topics • Interval Estimation • Full Model with Arrays • Qualitative Variables • Residual Analysis • Thoughts on Nonlinear Regression • Model-building Ideas • Multicollinearity • Autocorrelation, serial correlation Applied Regression -- Prof. Juran

12. Course Outline • Related Topics • Chi-square Goodness-of-Fit Tests • Forecasting Methods • Exponential Smoothing • Regression • Two Multivariate Methods • Cluster Analysis • Discriminant Analysis • Binary Logistic Regression Applied Regression -- Prof. Juran

13. The Theory Underlying Simple Linear Regression Regression can always be used to fit a straight line to a set of data. It is a relatively easy computational task (Excel, Minitab, etc.) . If specified conditions hold, statistical theory can be employed to evaluate the quality and reliability of the line - for prediction of future events. Applied Regression -- Prof. Juran

14. The Standard Statistical Model • Y: the “dependent” random variable, the effect or outcome that we wish to predict or understand. • X: the “independent” deterministic variable, an input, cause or determinant that may cause, influence, explain or predict the values of Y. The dependent random variable The independent deterministic variable The parameters of the “true” regression relationship A random “noise” factor Applied Regression -- Prof. Juran

15. Regression Assumptions The expected value of Y is a linear function of X: The variance of Y does not change with X: Applied Regression -- Prof. Juran

16. Regression Assumptions Random variations at different X values are uncorrelated: Random variations from the regression line are normally distributed: Applied Regression -- Prof. Juran

17. Thoughts on Linearity The significance of the word “linear” in the linear regression model is not linearity in the X’s, it is linearity in the Betas (the slope coefficients). Consider the following variants – both of which are linear: Applied Regression -- Prof. Juran

18. There are many creative ways to fit non-linear functions by linear regression. Consider a few popular linearizations: Time permitting, we will look at some of these possibilities later in the course. These may present interesting opportunities for student term projects. Applied Regression -- Prof. Juran

19. ˆ ˆ b b b b We seek g ood estimators of and of that minimize the sums of the 0 0 1 1 squared residuals (errors). The residual is i th ˆ ˆ = - b + b = ( ), 1 , 2 ,..., e y x i n 0 1 i i i Regression Estimators We are given the data set: Applied Regression -- Prof. Juran

20. Computer Repair Example Applied Regression -- Prof. Juran

21. Statistical Basics Basic statistical computations and graphical displays are very helpful in doing and interpreting a regression. We should always compute: Applied Regression -- Prof. Juran

22. Applied Regression -- Prof. Juran

23. We should always plot histograms of the y and x values, a time order plot of x and y (if appropriate) and a scatter plot of y on x. Graphical Analysis Applied Regression -- Prof. Juran

24. Applied Regression -- Prof. Juran

25. Applied Regression -- Prof. Juran

26. Applied Regression -- Prof. Juran

27. Estimating Parameters • Using Excel • Using Solver • Using analytical formulas Applied Regression -- Prof. Juran

28. Using Excel (Scatter Diagram) Applied Regression -- Prof. Juran

29. Applied Regression -- Prof. Juran

30. Using Excel (Data Analysis) Data Tab – Data Analysis Applied Regression -- Prof. Juran

31. Using Excel (Data Analysis) Applied Regression -- Prof. Juran

32. Using Solver Applied Regression -- Prof. Juran

33. Applied Regression -- Prof. Juran

34. Applied Regression -- Prof. Juran

35. Using Formulas RABE 2.13 RABE 2.13 Applied Regression -- Prof. Juran

36. Applied Regression -- Prof. Juran

37. Correlation and Regression There is a close relationship between regression and correlation. The correlation coefficient, , measures the degree to which random variables X and Y move together or not.  = +1 implies a perfect positive linear relationship while  = -1 implies a perfect negative linear relationship.  = 0 essentially implies independence. Applied Regression -- Prof. Juran

38. Statistical Basics: Covariance The covariance can be calculated using: or equivalently Usually, we find it more useful to consider the coefficient of correlation. That is, Sometimes the inverse relation is useful: Applied Regression -- Prof. Juran

39. Correlation and Regression • The sample (Pearson) correlation coefficient is • Regressions automatically produce an estimate of the squared correlation called R2 or R-square. Values of R-square close to 1 indicate a strong relationship while values close to 0 indicate a weak or non-existent relationship Applied Regression -- Prof. Juran

40. Some Validity Issues • We need to evaluate the strength of the relationship, whether we have the proper functional form, and the validity of the several statistical assumptions from a practical and theoretical viewpoint using a multiplicity of tools. • Fitted regression functions are interpolations of the data in hand, and extrapolation is always dangerous. Moreover, the functional form that fits the data in our range of “experience” may not fit beyond it. Applied Regression -- Prof. Juran

41. Regressions are based on past data. Why should the same functional form and parameters hold in the future? • In some uses of regression the future value of x may not be known – this adds greatly to our uncertainty. • In collecting data to do a regression choose x values wisely – when you have a choice. They should: • Be in the range where you intend to work • Be spread out along the range with some observations near practical extremes • Have replicated values at the same x or at very nearby x values for good estimation of  • Whenever possible test the stability of your model with a “holdout” sample, not used in the original model fitting. Applied Regression -- Prof. Juran

42. Summary • Course Objectives & Description • Review of Basic Statistical Ideas • Intercept, Slope, Correlation, Causality • Simple Linear Regression • Statistical Model and Concepts • Regression in Excel • Computer Repair Example Applied Regression -- Prof. Juran