1 / 62

MATH 2016 (13177) Statistical Modelling

MATH 2016 (13177) Statistical Modelling. Course coordinator: Chris Brien. Course is about designing experiments and using linear models to analyze data, both from experiments and surveys. I. Statistical inference. I.A E xpected values and variances I.B The linear regression model

finnea
Download Presentation

MATH 2016 (13177) Statistical Modelling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MATH 2016 (13177) Statistical Modelling Course coordinator: Chris Brien Course is about designing experiments and using linear models to analyze data, both from experiments and surveys. Statistical Modelling Chapter I

  2. I. Statistical inference I.A Expected values and variances I.B The linear regression model I.C Model selection a) Obtaining parameter estimates b) Regression analysis of variance I.D Summary Statistical Modelling Chapter I

  3. I.A Expected values and variances • Statistical inference is about drawing conclusions about one or more populations based on samples from the populations. • Compute statistics or estimates from samples. • They are used as estimates of particular population quantities, these being called parameters. • Important to be clear about distinction —when one is talking about a mean, is it the population or sample mean? • To aid in making the distinction, convention is to use Greek letters as symbols for parameters and ordinary Roman letters as symbols for statistics. • Fundamental in this course are population expected value and variance. Statistical Modelling Chapter I

  4. Expected value • Expected value  mean of the variable Y in a population — it is a population parameter. Definition I.1: The expected value of a continuous random variable Y whose population distribution is described by f(y) is given by • That is, yY=E[Y] is the mean in a population whose distribution is described by f(y). Statistical Modelling Chapter I

  5. Properties of expected values Theorem I.1: Let Y be a continuous random variable with probability distribution function f(y). The expected value of a function u(y) of the random variable is • Proof: not given • Note that any function of a random variable is itself a random variable. • Use above theorem in next • Theorem I.2: Statistical Modelling Chapter I

  6. In particular, Proof of Theorem I.2 • For a continuous random variable, we have from theorem I.1 Statistical Modelling Chapter I

  7. Variance Definition I.5: The variance of any random variable Y is defined to be • That is the variance is the mean in the population of the squares of the deviations of the observed values from the population mean. • It measures how far on average observations are from the mean in the population. • It is also a population parameter. Statistical Modelling Chapter I

  8. Proof: This is a straight forward application of theorem I.1 where Variance (cont’d) Theorem I.3: The variance of a continuous random variable Y whose population distribution is described by f(y) is given by Statistical Modelling Chapter I

  9. Normal distribution parameters and estimators • Common in this course • The distribution function for such a variable involves the parameters yY and sY as follows: • So we want to estimate yY and we have a sample y1, y2,…, yn. • Note the lower case y for observed values as opposed to Y for the random variable. • The obvious estimator of y (drop subscript) is the sample mean Statistical Modelling Chapter I

  10. means that the estimator of y is • also stands for the estimate so that means that the estimate of y is Estimators • Note we call the formula that tells us how to estimate a parameter an estimator and it is a function of random variables, Ys. • The value obtained by substituting the sample values into the formula is called the estimate and it is a function of observed values, ys. • It is common practice to denote the estimator as the parameter with a caret over it. Statistical Modelling Chapter I

  11. We consider models of the general form: • where • Y is a continuous random variable and • xis are quantitative variables that are called the explanatory variables. • This model is a linear model in the qis. I.B The linear regression model All but the last two are linear in the qis. Statistical Modelling Chapter I

  12. The model for n observations • Would conduct a study in which n (p+1) observations are taken of Y and the xis. • Leads to the following system of equations that model the observed responses. • What does the model tell us about our data? • Have a response variable, Y, whose values are related to several explanatory variables, xis. (lower case x as not random variables) • eis account for differences in values of response variable for same combination of values of the explanatory variables Statistical Modelling Chapter I

  13. Usual extra assumptions about eis • These mean: • on average the errors cancel out so that we get the population value of the response, • the variability of the errors is independent of the values of any of the variables • the error in one observation is unrelated to that of any other observation. • The last assumption involves a third quantity involving expectations: covariance. Statistical Modelling Chapter I

  14. Covariance Definition I.6: The covariance of two random variables, X and Y, is defined to be • The covariance measures the extent to which the two random variables values move together. • In fact, the linear correlation coefficient can be calculated from it as follows: • That is, the correlation coefficient is just the covariance adjusted or standardized for the variance of X and Y. Statistical Modelling Chapter I

  15. Matrix notation for the system of equations • Matrices in bolded upper case letters and • Vectors in bolded lower case except vectors of random variables will be in upper case. • Thus in matrix terms let • The system of equations can be written Statistical Modelling Chapter I

  16. Then, the random vector is • The expectation vector, y, giving the expectation of Y is Expectation and variance of a random vector Definition I.7: Let Y be a vector of n jointly-distributed random variables with Statistical Modelling Chapter I

  17. Expectation and variance of a random vector (cont’d) • The variance matrix, V, giving the variance of Y is • Note transpose in last expression Statistical Modelling Chapter I

  18. A column vector premultiplied by its transpose is the sum of squares of its elements, also a scalar — • A column vector of order n post multiplied by its transpose is a symmetric matrix of order n n — from property 7 we have • The transpose of a product is the product of the transposes, but with the order of the matrices reversed — Lemma I.1: The transpose of a matrix (selected properties) • The transpose of a column vector is a row vector and vice versa so that we always write the column vector as untransposed and the row vector as transposed — a is a column vector and a is the corresponding row vector. • In particular, property 10 applies to V in definition I.7 and tells us that V is an n n symmetric matrix. Statistical Modelling Chapter I

  19. Model for expectation & variance • Have model for Y with conditions on e. • Find expressions for elements of E[Y] and var[Y]. • Thus, Statistical Modelling Chapter I

  20. In matrix terms, the alternative expression for the model is: Model in terms of expectation and variance (cont’d) • That is, Ve is also the variance matrix for Y. Statistical Modelling Chapter I

  21. Example I.1 House price • Suppose it is thought that the price obtained for a house depends primarily the age and livable area. • Observe 5 randomly selected houses on the market: • In this example, n = 5 and p = 2. Statistical Modelling Chapter I

  22. In matrix terms, the model is: Model proposed for data • or, equivalently, • or, equivalently, Statistical Modelling Chapter I

  23. Model matrices for example • We also have the vector, y, of observed values of Y: Statistical Modelling Chapter I

  24. Example I.2 Voter turnout • In this example a political scientist attempted to investigate the relationship between campaign expenditures on televised advertisements and subsequent voter turnout. • Aim to predict voter turnout from advertising expenditure. Statistical Modelling Chapter I

  25. Proposed model • Simple linear regression as only 1 explanatory variable. • Drop subscript for the independent variable: • How should data behave for this model? • E[Yi] specifies population mean. • var[Yi] specifies variability around population mean. • cov[Yi, Yj] specifies relationship Statistical Modelling Chapter I

  26. Scatter diagram for Turnout versus Expend • Does it look like the model will describe this situation? Statistical Modelling Chapter I

  27. I.C Model selection • Generally, we want to determine the model that best describes the data. • To do this we usually obtain estimates of our parameters under several alternative models and use these in deciding which model to use to describe the data. • The choice of models is often made using an analysis of variance (ANOVA). Statistical Modelling Chapter I

  28. a) Obtaining parameter estimates • Estimators for the parameters q in the expectation model are obtained using the least squares or maximum likelihood criteria — they are equivalent in the context of linear models. • Also, an estimator for s2 is obtained from the ANOVA described in the next section. • Here will establish the least squares estimators for q. Statistical Modelling Chapter I

  29. Least squares estimators • Definition I.8: Let Y=Xq + e where • X is an nq matrix with nq, • qis a q1 vector of unknown parameters, • e is an n1 vector of errors with mean 0 and variance s2In, q=p + 1 and nq. The least ordinary least squares (OLS) estimator of q is the value of q that minimizes • Note that • ee is of the form described in property 9 of lemma I.1 • and is a scalar that is the sum of squares of the elements of e or the sum of squares of the "errors". Statistical Modelling Chapter I

  30. Least squares estimators for q • Theorem I.5: Let Y=Xq + e where • Y is an n1 vector of random variables for the observations, • X is an nq matrix of full rank with nq, • qis a q1 vector of unknown parameters, • e is an n1 vector of errors with mean 0 and variance s2In, q=p + 1 and nq. The ordinary least squares estimator for q is given by • (The ‘^’ denotes estimator) • Proof: see notes Statistical Modelling Chapter I

  31. Note the dual use of to denote the estimator and the estimate. Least squares estimates of q • For a particular example, we will have an observed vector y — substitute this into the estimator to yield the estimate for that example. Statistical Modelling Chapter I

  32. What does full rank mean? • Definition I.9: The rank of an nq matrix A with nq is the number of linearly independent columns of the matrix. The matrix is said to be of full rank, or rank q, if, none of the columns in the matrix can be written as a linear combination of the other columns. • Example I.1 House price (continued) For this example the X matrix is It is rank 3 and is full rank as no column can be written as a linear combination of the other two. Statistical Modelling Chapter I

  33. Another example • On the other hand the following two matrices are of rank 2 as the second columns are 5(3) and 5(3) – 9(1), respectively: Statistical Modelling Chapter I

  34. Fitted values and residuals • Definition I.10: The estimator of the fitted values for the expectation model E[Y] =y=Xq is given by • and so the estimates are computed by substituting the values of the estimates and the explanatory variables into the fitted equation. • Definition I.10: The estimator of the residuals for the expectation model E[Y] =y=Xq is given by and so the estimates are computed by subtracting the actual fitted values from the observed values of the response variable. Statistical Modelling Chapter I

  35. Recap thus far • Often want to decide between two models • Fit models using least squares • Want to use ANOVA to select between alternatives • For the model Y=Xq + e or E[Y] =y=Xq and V =s2I, the ordinary least squares estimator for q is given by • The estimator of the fitted values for the expectation model is given by • and of the residuals is given by • Least squares can be viewed as the orthogonal projection of the data vector, in the n-dimensional data space, into both the model and residual subspaces using the Qs. Statistical Modelling Chapter I

  36. Residuals as a linear combination • Given the expression for the fitted values, the residuals are given by Statistical Modelling Chapter I

  37. Seen that Projection operators — QM • QM is a nn projection matrix with the property that it is symmetric and idempotent. • Definition I.12: A matrix E is idempotent if E2=E. • Given that X is an nq matrix, • then QM = X(XX)-1X • is the product of nq, qqand qn matrices • with the result that it is an nn matrix. • Clearly the product of the nn matrix QM and the n1 vector Y is an n1 vector. • So the estimator of the fitted values is a linear combination of the elements of Y. Statistical Modelling Chapter I

  38. Projection operators — QR • Theorem I.6: Given that the matrix E is symmetric and idempotent, • then R=I-E is also symmetric and idempotent. • In addition, RE=ER=0. • Application of this theorem to the regression situation leads us to conclude that • QR is symmetric and idempotent • with QRQM=QMQR=0. • All of this can be viewed as the orthogonal projection of vectors onto subspaces. Statistical Modelling Chapter I

  39. Geometry of least squares • The observation vectory is viewed as a vector in n-space and this space is called the data space. • Then the X matrix with q linearly independent columns a q-dimensional subspace of the data space — this space is called the model (sub)space Statistical Modelling Chapter I

  40. Geometry of least squares (cont’d) • Fitted values are orthogonal projection of observation vector into the model space. • The orthogonal projection is achieved using the idempotent, or projection matrix, QM. • Residuals are projection of observation vector into the residual subspace, the subspace of the data space orthogonal to the model space. • Matrix that projects onto the residual subspace is QR. • That QRQM=QMQR=0 reflects that the two subspaces are orthogonal. Statistical Modelling Chapter I

  41. obvious why Projectors properties • Once you have projected y into the model subspace and obtained QMy, it is in the model subspace. • Applying QM to the fitted values, that is to QMy, will have no effect because they are already in the model subspace; • clearly, • A similar argument applies to QR. • Also, it should be clear why QRQM=0. Statistical Modelling Chapter I

  42. Example I.3 Single sample • Suppose that a single sample of 3 observations has been obtained. • The linear model we propose for this data is that • or, for an individual observation, • That is, the value for an observation is made up of • the population mean • plus a particular deviation from the population mean for that observation. Statistical Modelling Chapter I

  43. Projection matrix • In this case QM, a 3  3 matrix, is rather simple as • and Statistical Modelling Chapter I

  44. Note that • estimator of m in our model is the mean of the elements of Y, • estimate is the mean of the observations, Grand mean operator • That is, in this case, QM is the matrix that replaces each observation with the grand mean of all the observations. • Throughout this course the vector of grand means will be denoted as Statistical Modelling Chapter I

  45. Suppose that • Then • and fitting the model E[Y] =1nm results in • fitted values • residuals fitted vector in model subspace residual subspace orthogonal to model subspace residual vector A simple 3-D example 1st data point plotted on axis coming out of figure, 2nd on axis going across 3rd on axis going up. Statistical Modelling Chapter I

  46. I.C Model selection • Generally, we want to determine the model that best describes the data. • obtain estimates of our parameters under several alternative models. • Choose model using an analysis of variance (ANOVA). Statistical Modelling Chapter I

  47. b) Regression analysis of variance • An ANOVA is used to compare potential models. • In the case of the regression model, it is common to want to choose between two expectation models, one which is a subset of the other. Statistical Modelling Chapter I

  48. Testing all expectation parameters are zero • The simplest, although not necessarily the most useful, situation is where one compares the expectation models • So we first state the null and alternative hypothesis for the hypothesis test. • H0: q= 0 (equivalent to E[Yi] = 0) • H1: q 0 (equivalent to E[Yi] =q0 + q1x1i + q2x2i) Statistical Modelling Chapter I

  49. Computing test statistic using ANOVA table • Generally ANOVA comparing two models involves • SSqs of fitted values for the null model and • difference between SSqs of the fitted values for the two models. • In this case, • Fitted values for null model are all 0 and so the difference in SSqs is equal to the SSq of the fitted values of the alternative model. • Could leave Model0 out of the table altogether. • Note use of s2, the symbol for variance, for MSqs • because MSqs are variances (ratio of a SSq to its df). Statistical Modelling Chapter I

  50. Computing test statistic using ANOVA table • Two parallel identities: • Obviously Total df = Model df + Residual df. • Not so clear Total SSq = Model SSq + Residual SSq (but remember geometry). • SSq are just those of fitted values, residuals or Y • If the p-value is less than the significance level, a, the H0 is rejected. Usually, a= 0.05. Statistical Modelling Chapter I

More Related