1 / 140

AMS 572 Group #2

Multiple Linear Regression. AMS 572 Group #2. Outline. Jinmiao Fu—Introduction and History Ning Ma—Establish and Fitting of the model Ruoyu Zhou—Multiple Regression Model in Matrix Notation Dawei Xu and Yuan Shang—Statistical Inference for Multiple Regression

naida
Download Presentation

AMS 572 Group #2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Linear Regression AMS 572 Group #2

  2. Outline • Jinmiao Fu—Introduction and History • Ning Ma—Establish and Fitting of the model • Ruoyu Zhou—Multiple Regression Model in Matrix Notation • DaweiXu and Yuan Shang—Statistical Inference for Multiple Regression • Yu Mu—Regression Diagnostics • Chen Wang and Tianyu Lu—Topics in Regression Modeling • TianFeng—Variable Selection Methods • Hua Mo—Chapter Summary and modern application

  3. Introduction • Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable

  4. Example: The relationship between an adult’s health and his/her daily eating amount of wheat, vegetable and meat.

  5. History

  6. Karl Pearson (1857–1936) Lawyer, Germanist, eugenicist, mathematician and statistician Correlation coefficient Method of moments Pearson's system of continuous curves. Chi distance, P-value Statistical hypothesis testing theory, statistical decision theory. Pearson's chi-square test, Principal component analysis.

  7. Sir Francis Galton FRS (16 February 1822 – 17 January 1911) Anthropology and polymathy Doctoral students Karl Pearson In the late 1860s, Galton conceived the standard deviation. He created the statistical concept of correlation and also discovered the properties of the bivariate normal distribution and its relationship to regression analysis

  8. Galton invented the use of the regression line (Bulmer 2003, p. 184), and was the first to describe and explain the common phenomenon of regression toward the mean, which he first observed in his experiments on the size of the seeds of successive generations of sweet peas.

  9. The publication by his cousin Charles Darwin of The Origin of Species in 1859 was an event that changed Galton's life. He came to be gripped by the work, especially the first chapter on "Variation under Domestication" concerning the breeding of domestic animals.

  10. Adrien-Marie Legendre (18 September 1752 – 10 January 1833) was a French mathematician. He made important contributions to statistics, number theory, abstract algebra and mathematical analysis. He developed the least squares method, which has broad application in linear regression, signal processing, statistics, and curve fitting.

  11. Johann Carl Friedrich Gauss (30 April 1777 – 23 February 1855) was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.

  12. Gauss, who was 23 at the time, heard about the problem and tackled it. After three months of intense work, he predicted a position for Ceres in December 1801—just about a year after its first sighting—and this turned out to be accurate within a half-degree. In the process, he so streamlined the cumbersome mathematics of 18th century orbital prediction that his work—published a few years later as Theory of Celestial Movement—remains a cornerstone of astronomical computation.

  13. Itintroduced the Gaussian gravitational constant, and contained an influential treatment of the method of least squares, a procedure used in all sciences to this day to minimize the impact of measurement error. Gauss was able to prove the method in 1809 under the assumption of normally distributed errors (see Gauss–Markov theorem; see also Gaussian). The method had been described earlier by Adrien-Marie Legendre in 1805, but Gauss claimed that he had been using it since 1795.

  14. Sir Ronald Aylmer Fisher FRS (17 February 1890 – 29 July 1962) was an English statistician, evolutionary biologist, eugenicist and geneticist. He was described by Anders Hald as "a genius who almost single-handedly created the foundations for modern statistical science," and Richard Dawkins described him as "the greatest of Darwin's successors".

  15. In addition to "analysis of variance", Fisher invented the technique of maximum likelihood and originated the concepts of sufficiency, ancillarity, Fisher's linear discriminator and Fisher information.

  16. Establish and Fitting of the Model

  17. Probabilistic Model : the observed value of the random variable(r.v.) depends on fixed predictor values ,i=1,2,3,…,n unknown model parameters i.i.d ~N (0, ) n is the number of observations.

  18. Fitting the model • LS provides estimates of the unknown model parameters, which minimizesQ (j=1,2,…,k)

  19. Tire tread wear vs. mileage (example11.1 in textbook) The table gives the measurements on the groove of one tire after every 4000 miles. Our Goal: to build a model to find the relation between the mileage and groove depth of the tire.

  20. SAS code----fitting the model Data example; Input mile depth @@; Sqmile=mile*mile; Datalines; 0 394.33 4 329.5 8 291 12 255.17 16 229.33 20 204.83 24 179 28 163.83 32 150.33 ; run; Proc reg data=example; Model Depth= mile sqmile; Run;

  21. Depth=386.26-12.77mile+0.172sqmile

  22. Goodness of Fit of the Model • Residuals • are the fitted values An overall measure of the goodness of fit Error sum of squares (SSE): • total sum of squares (SST): regression sum of squares (SSR):

  23. Multiple Regression Model In Matrix Notation

  24. 1. Transform the Formulas to Matrix Notation

  25. The first column of X denotes the constant term (We can treat this as with)

  26. Finally let where the (k+1)1 vectors of unknown parameters LS estimates

  27. Formula becomes • Simultaneously, the linear equation are changed to Solve this equation respect to and we get (if the inverse of the matrix exists.) -1

  28. 2. Example 11.2 (Tire Wear Data: Quadratic Fit Using Hand Calculations) • We will do Example 11.1 again in this part using the matrix approach. • For the quadratic model to be fitted

  29. -1 • According to formula we need to calculate first and then invert it and get

  30. Finally, we calculate the vector of LS estimates

  31. Therefore, the LS quadratic model is This model is the same as we obtained in Example 11.1.

  32. Statistical Inference for Multiple Regression

  33. Statistical Inference for Multiple Regression • Determine which predictor variables have statistically significant effects • We test the hypotheses: • If we can’t reject H0j, then xj is not a significant predictor of y.

  34. Statistical Inference on • Review statistical inference for Simple Linear Regression

  35. Statistical Inference on • What about Multiple Regression? • The steps are similar

  36. Statistical Inference on • What’s Vjj? Why ? 1. Mean Recall from simple linear regression, the least squares estimators for the regression parameters and are unbiased. Here, of least squares estimators is also unbiased.

  37. Statistical Inference on • 2.Variance • Constant Variance assumption:

  38. Statistical Inference on • Let Vjj be the jth diagonal of the matrix

  39. Statistical Inference on

  40. Statistical Inference on

  41. Statistical Inference on • Therefore,

  42. Statistical Inference on • Derivation of confidence interval of The 100(1-α)% confidence interval for is

  43. Statistical Inference on • Rejects H0j if

  44. Prediction of Future Observation • Having fitted a multiple regression model, suppose we wish to predict the future value of Y for a specified vector of predictor variables x*=(x0*,x1*,…,xk*) • One way is to estimate E(Y*) by a confidence interval(CI).

  45. Prediction of Future Observation

  46. F-Test for Consider: Here is the overall null hypothesis, which states that none of the variables are related to . The alternative one shows at least one is related.

  47. How to Build a F-Test…… • The test statistic F=MSR/MSE follows F-distribution with k and n-(k+1) d.f. The α -level test rejects if recallthat MSE(error mean square) with n-(k+1) degrees of freedom.

  48. The relation between F and r F can be written as a function of r. By using the formula: F can be as: We see that F is an increasing function of r ² and test the significance of it.

More Related