Matrix Algebra and Linear Models

Matrix Algebraand Linear Models

0 1 1 2 @ A a = 1 3 b = ( 2 0 5 2 1 ) 4 7 0 1 0 1 3 1 2 0 1 @ A @ A C = 2 5 4 D = 3 4 1 1 2 2 9 Matrices: An array of elements Vectors Usually written in bold lowercase (3 x 1) Row vector (1 x 4) Column vector General Matrices Usually written in bold uppercase (3 x 2) (3 x 3) Square matrix Dimensionality of a matrix: r x c (rows x columns) think of Railroad Car A matrix is defined by a list of its elements. B has ij-th element Bij -- the element in row i and column j

µ ∂ µ ∂ - - 4 2 2 ° 2 - C = A + B = a n d D = A ° B = 3 3 ° 1 1 µ ∂ µ ∂ 3 0 1 2 A = a n d B = 1 2 2 1 Addition and Subtraction of Matrices If two matrices have the same dimension (same number of rows and columns), then matrix addition and subtraction simply follows by adding (or subtracting) on an element by element basis Matrix addition: (A+B)ij = A ij + B ij Matrix subtraction: (A-B)ij = A ij - B ij Examples:

0 1 . . … … … 0 1 3 . 1 2 B C µ ∂ 3 1 2 B C ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ a b @ A B C C = 2 5 4 = = . B C . d B 2 . 5 4 @ A 1 1 2 . . 1 . 1 2 µ ∂ µ ∂ 2 5 4 a = ( 3 ) ; b = ( 1 2 ) ; d = ; B = 1 1 2 • A row vector whose elements are column vectors A column vector whose elements are row vectors Partitioned Matrices It will often prove useful to divide (or partition) the elements of a matrix into a matrix whose elements are itself matrices. One can partition a matrix in many ways

n . X a ¢ b = a b i i i = 1 0 1 1 2 B C a = a n d b = ( 4 5 7 9 ) @ A 3 4 Towards Matrix Multiplication: dot products The dot (or inner) product of two vectors (both of length n) is defined as follows: Example: a.b = 1*4 + 2*5 + 3*7 + 4*9 = 60

The least-squares solution for the linear model … y = π + Ø x + ¢ ¢ ¢ Ø x n n 1 1 yields the following system of equations for the bi 2 æ ( y ; z ) = Ø æ ( z ) + Ø æ ( z ; z ) + ¢ ¢ ¢ + Ø æ ( z ; z ) n n 1 1 1 2 1 2 1 2 æ ( y ; z ) = Ø æ ( z ; z ) + Ø æ ( z ) + ¢ ¢ ¢ + Ø æ ( z ; z ) n n 2 1 1 2 2 2 2 . . . . . . . . . . . . . . . 2 æ ( y ; z ) = Ø æ ( z ; z ) + Ø æ ( z ; z ) + ¢ ¢ ¢ + Ø æ ( z ) n n n n n 1 1 2 2 Towards Matrix Multiplication: Systems of Equations Matrix multiplication is defined in such a way so as to compactly represent and solve linear systems of equations. This can be more compactly written in matrix form as

The order in which matrices are multiplied effects the matrix product, e.g. AB = BA Elements in the jth column of B Elements in the ith row of matrix A ij-th element of C is given by Outer indices given dimensions of resulting matrix, with r rows (A) and c columns (B) k Inner indices must match columns of A = rows of B X C = A B i j i l l j l = 1 Matrix Multiplication: Quick overview For the product of two matrices to exist, the matrices must conform. For AB,the number of columns of A must equal the number of rows of B. The matrix C = AB has the same number of rows as A and the same number of columns as B. C(rxc) = A(rxk) B(kxc)

Express the matrix M as a column vector of row vectors 0 1 m 1 … m B C 2 B C M = w h e r e m = ( M M ¢ ¢ ¢ M ) . i i i c i 1 2 . @ A . m r Likewise express N as a row vector of column vectors 0 1 N j 1 N … B C j 2 B C N = ( n n ¢ ¢ ¢ n ) w h e r e n = . 1 2 b j @ A . The ij-th element of L is the inner product of M’s row i with N’s column j . N c j . … . . 0 1 m ¢ n m ¢ n ¢ ¢ ¢ m ¢ n 1 1 1 2 1 b … . . . m ¢ n m ¢ n ¢ ¢ ¢ m ¢ n B C 2 1 2 2 2 b B C L = . . . . . . . . @ A . . . . . … . . m ¢ n m ¢ n ¢ ¢ ¢ m ¢ n r 1 r 2 r b More formally, consider the product L = MN

µ ∂ µ ∂ µ ∂ ) ) ( ) ( ( a b e f a e + b g a f + b h A B = = c d g h c e + d g c f + d h Likewise ) µ ∂ ( a e + c f e b + d f B A = g a + c h g d + d h Examples

0 1 0 1 a b 1 1 . . . . @ A @ A a = b = . . a b n n 0 1 b … 1 n Note thatbTa = (aTb)T = aTb X . T @ A . ( a ¢ ¢ ¢ a ) = a b = a b . n i i 1 i = 1 b n Indices match, matrices conform Dimension of resulting product is 1 X 1 (i.e. a scalar) 0 1 a b a b ¢ ¢ ¢ a b 0 1 n 1 1 1 2 1 a … 1 a b a b ¢ ¢ ¢ a b B C . n 2 1 2 2 2 T B C . @ A ( b ¢ ¢ ¢ b ) = a b = . . . . . n 1 . . . . @ A . Resulting product is an nxn matrix . . . a n a b a b ¢ ¢ ¢ a b n n n n 1 2 The Transpose of a Matrix The transpose of a matrix exchanges the rows and columns, ATij = Aji Useful identities (AB)T = BT AT (ABC)T = CT BT AT Inner product = aTb = aT(1 X n) b(n X 1) Outer product =abT = a (n X 1) bT (1 X n)

1 for i = j Iij = 0 otherwise The Identity Matrix, I The identity matrix serves the role of the number 1 in matrix multiplication: AI =A, IA = A I is adiagonal matrix, with all diagonal elements being one, all off-diagonal elements zero.

- 1 - 1 A A = A A = I µ ∂ µ ∂ ( 1 d ° b a b For ° 1 A = A = c d ° c a a d ° b c If this quantity (the determinant) is zero, the inverse does not exist. The Inverse Matrix, A-1 For a square matrix A, define the Inverse of A, A-1, as the matrix satisfying If det(A) is not zero, A-1 exists and A is said to be non-singular. If det(A) = 0, A is singular, and no unique inverse exists(generalized inversesdo)

Matrix of known constants Vector of known constants Vector of unknowns we wish to solve for 0 1 0 1 0 1 2 æ ( z ) æ ( z ; z ) : : : æ ( z ; z ) Ø æ ( y ; z ) n 1 1 2 1 1 1 2 æ ( z ; z ) æ ( z ) : : : æ ( z ; z ) Ø æ ( y ; z ) B C B C B C n 1 2 2 2 2 2 B C B C B C = . . . . . . . @ A @ A @ A . . . . . . . . . . . 2 æ ( z ; z ) æ ( z ; z ) : : : æ ( z ) Ø æ ( y ; z ) n n n n n 1 2 Useful identities (AB)-1 = B-1 A-1 (AT)-1 = (A-1)T Key use of inverses --- solving systems of equations Ax = c A-1 Ax = A-1 c. Note that A-1 Ax = Ix = x Hence, x = A-1 c Vb = c ---> b = V-1 c

0 1 0 1 2 æ ( y ; z ) æ ( z ) æ ( z ; z ) 1 1 1 2 b = V-1 c @ A @ A c = V = 2 æ ( y ; z ) æ ( z ; z ) æ ( z ) 2 1 2 2 It is more compact to use æ ( z ; z ) = Ω æ ( z ) æ ( z ) 1 2 1 2 1 2 - 0 1 2 æ ( z ) ° æ ( z ; z ) - 2 1 2 1 ° 1 @ A - V = - 2 2 2 æ ( z ) æ ( z ) ( 1 ° Ω ) 1 2 2 ° æ ( z ; z ) æ ( z ) 1 2 1 2 1 0 1 0 1 0 1 2 Ø æ ( z ) ° æ ( z ; z ) æ ( y ; z ) 1 2 1 2 1 1 @ A @ A @ A = - 2 æ 2 ( z ) æ 2 ( z ) ( 1 ° Ω ) 1 2 2 Ø 1 2 ° æ ( z ; z ) æ ( z ) æ ( y ; z ) 2 1 2 1 2 Example: solve the OLS for b in y = a + b1z1 + b2z2 + e

∑ ∏ 1 æ ( y ; z ) æ ( y ; z ) - 1 2 Ø = ° Ω - 1 1 2 2 2 1 ° Ω æ ( z ) æ ( z ) æ ( z ) 1 1 2 1 2 ∑ ∏ 1 æ ( y ; z ) æ ( y ; z ) - 2 1 Ø = ° Ω - 2 1 2 2 2 1 ° Ω æ ( z ) æ ( z ) æ ( z ) 2 1 2 1 2 æ ( y ; z ) æ ( y ; z ) 1 2 Ø = a n d Ø = 1 2 2 2 æ ( z ) æ ( z ) 1 2 If r12 = 0, these reduce to the two univariate slopes,

n n X X T x A x = a x x i j i j i j = 1 = 1 m n X X T b A a = A b a i j i j i j = 1 = 1 Quadratic and Bilinear Forms Quadratic product: for An x n and xn x 1 Scalar (1 x 1) Bilinear Form (generalization of quadratic product) for Am x n, an x 1, bm x1 their bilinear form is bT1 x m Am x n an x 1 Note that bTAa=aTATb

( 0 1 √ ! ( ) n n n X X X ° ¢ T 2 2 @ A æ c x = æ c x = æ c x ; c x i i i i j j i i j = 1 = 1 = 1 n n n n X X X X = æ ( c x ; c x ) = c c æ ( x ; x ) i i j j i j i j i j i j = 1 = 1 = 1 = 1 T = c V c T T T æ ( a x ; b x ) = a V b Covariance Matrices for Transformed Variables What is the variance of the linear combination, c1x1 + c2x2 + … + cnxn ? (note this is a scalar) Likewise, the covariance between two linear combinations can be expressed as a bilinear form,

Now suppose we transform one vector of random variables into another vector of random variables Transform x into (i) yk x 1 = Ak x n xn x 1 (ii) zm x 1 = Bm x n xn x 1 The covariance between the elements of these two transformed vectors of the original is a k x m covariance matrix = AVBT For example, the covariance between yi and yj is given by the ij-th element of AVAT Likewise, the covariance between yi and zj is given by the ij-th element of AVBT

µ ∂ - n - Y - - 2 ( x ° π ) i i ° = ° 1 2 1 p ( x ) = ( 2 º ) æ e x p ° i 2 2 æ i i = 1 - √ ! √ ! ° - 1 n n - Y X 2 - ( x ° π ) i i ° n = 2 = ( 2 º ) æ e x p ° i 2 2 æ i i i = 1 = 1 The Multivariate Normal Distribution (MVN) Consider the pdf for n independent normal random variables, the ith of which has mean mi and variance s2i This can be expressed more compactly in matrix form

… 0 1 2 æ 0 ¢ ¢ ¢ 0 1 n Y 2 0 æ ¢ ¢ ¢ 0 B C 2 j V j = æ 2 B C V = i . . . . @ A . . . . . . . . i = 1 … 2 0 ¢ ¢ ¢ ¢ ¢ ¢ æ n 0 1 - n π - X 2 ( x ° π ) - - 1 i i T ° 1 = ( x ° π ) V ( x ° π ) π B C 2 2 æ B C π = i . i = 1 @ A . . π n ∑ ∏ - - - - 1 - - ° n = ° = T ° 2 1 2 1 p ( x ) = ( 2 º ) j V j e x p ° ( x ° π ) V ( x ° π ) 2 Define the covariance matrix V for the vector x of the n normal random variable by Fun fact! For a diagonal matrix D, the Det = | D | = product of the diagonal elements Define the mean vector m by Hence in matrix from the MVN pdf becomes Now notice that this holds for any vector m and symmetric (positive-definite) matrix V (pd -> aTVa = Var(aTx) > 0 for all nonzero a)

f o r y = x + a ; y i s M V N ( π + a ; V ) n n X T T T f o r y = a x = a x ; y i s N ( a π ; a V a ) i i k = 1 ≥ ¥ T f o r y = A x ; y i s M V N A π ; A V A m Properties of the MVN - I 1) If x is MVN, any subset of the variables in x is also MVN 2) If x is MVN, any linear combination of the elements of x is also MVN. If x is MVN(m,V)

0 1 µ ∂ µ ∂ ) ) ( ( V V x x x x x 1 1 1 2 π 1 x = @ A 1 π = a n d V = x π 2 T 2 V V x x x x 2 2 1 2 - - ° 1 π = π + V V ( x ° π ) x x x x x x j 2 1 2 1 2 1 2 2 2 - - ° T 1 V = V ° V V V x x x x x x x x x x j 1 1 1 2 1 2 2 2 1 2 Properties of the MVN - II 3) Conditional distributions are also MVN. Partition x into two components, x1 (m dimensional column vector) and x2 ( n-m dimensional column vector) x1 | x2 is MVN with m-dimensional mean vector and m x m covariance matrix

linear x = π + e x x 1 j 1 2 - - ° 1 = π + V V ( x ° π ) + e x x x x 2 1 2 1 2 2 2 Where e is MVN with mean vector 0 and Variance-covariance matrix The regression is homoscedastic because the variance is a constant that does not change with the particular value of the random vector x2 The regression is linear because it is a linear function of x2 Properties of the MVN - III 4) If x is MVN, the regression of any subset of X on another subset is linear and homoscedastic

Example: Regression of Offspring value on Parental values 0 1 2 0 1 0 1 3 2 2 z π 1 h = 2 h = 2 o o is 2 @ A 4 @ A @ A 5 2 z ª M V N π ; æ h = 2 1 0 s s z 2 z π h = 2 0 1 d d µ ∂ ) ( z Let s x = ( z ) ; x = o 1 2 z d ( ) 2 2 h æ 1 0 z 2 2 V = æ ; V = ( 1 1 ) ; V = æ x x x x x x ; ; ; z z 1 1 1 2 2 2 0 1 2 µ ∂ µ ∂ ( ( - 2 2 h æ 1 0 z ° π - z ° s s 2 z = π + ( 1 1 ) æ + e - o o z 0 1 z ° π 2 d d 2 2 h h - - = π + ( z ° π ) + ( z ° π ) + e - o s s d d - ° 2 2 1 = π + V V ( x ° π ) + e x x x x 2 1 2 1 2 2 2 ( ( 2 2 2 2 h æ h æ 1 0 1 - - z ° z 2 2 2 æ = æ ° ( 1 1 ) æ e z z 0 1 1 2 2 µ ∂ 4 - h 2 = æ 1 ° z 2 Assume the vector of offspring value and the values of both its parents is MVN. Then from the correlations among (outbred) relatives, Hence, Where e is normal with mean zero and variance

Hence, the regression of offspring trait value given the trait values of its parents is zo = mo + h2/2(zs- ms) + h2/2(zd- md) + e where the residual e is normal with mean zero and Var(e) = sz2(1-h4/2) Similar logic gives the regression of offspring breeding value on parental breeding value as Ao = mo + (As- ms)/2 + (Ad- md)/2 + e = As/2 + Ad/2 + e where the residual e is normal with mean zero and Var(e) = sA2/2

Linear Models

Vector of observed dependent values Vector of residuals. Usually assumed MVN with mean zero Design matrix --- observations of the variables in the assumed linear model Vector of unknown parents to estimate Quick Review of the Major Points The general linear model can be written as y = Xb + e If the residuals are homoscedastic and uncorrelated, so that we can write the cov matrix of e as Cov(e) = s2I then use the OLS estimate, OLS(b) = (XTX)-1 XTy If the residuals are heteroscedastic and/or dependent, Cov(e) = V and we must use GLS solution for b, GLS(b) = (XT V-1 X)-1 V-1 XTy

… y = π + Ø x + Ø x + ¢ ¢ ¢ + Ø x + e n x 1 1 2 2 Linear Models One tries to explain a dependent variable y as a linear function of a number of independent (or predictor) variables. A multiple regression is a typical linear model, Here e is the residual, or deviation between the true value observed and the value predicted by the linear model. The (partial) regression coefficients are interpreted as follows. A unit change in xi while holding all other variables constant results in a change of bi in y As with a univariate regression (y =a + bx + e), the model parameters are typically chosen by least squares, wherein they are chosen to minimize the sum of squared residuals, S ei2

Trait value of offspring j from sire i Overall mean. This term in included to give the si terms a mean value of zero, i.e., they are expressed as deviations from the mean The effect for sire i (the mean of its offspring). Recall that variance in the si estimates Cov(half sibs) = VA/2 The deviation of the jth offspring from the family mean of sire i. The variance of the e’s estimates the within-family variance. Ω 1 i f s i r e k = i x = i k 0 o t h e r w i s e Predictor and Indicator Variables In a regression, the predictor variables are typically continuous, although they need not be. Suppose we measuring the offspring of p sires. One linear model would be yij = m + si + eij Note that the predictor variables here are the si, something that we are trying to estimate We can write this in linear model form by using indicator variables, Giving the above model as

Regression model ANOVA model Models consisting entirely of indicator variables are typically called ANOVA, or analysis of variance models Models that contain no indicator variables (other than for the mean), but rather consist of observed value of continuous or discrete values are typically called regression models Both are special cases of the General Linear Model (or GLM) yijk = m + si + dij + bxijk + eijk Example: Nested half sib/full sib design with an age correction b on the trait

y = π + Ø x + Ø x + Ø x + e i i i i i 1 1 2 2 2 2 y = X Ø + e 0 1 0 1 0 1 e 0 1 y 1 x x x π 1 1 1 1 1 2 1 3 e y 1 x x x B C Ø B C B C B C 2 2 2 1 2 2 2 3 e = 1 y = X = Ø = @ A @ A @ A @ A e y 1 x x x Ø 3 3 1 3 2 3 3 2 3 Ø y 1 x x x e 3 4 4 1 4 2 4 3 4 Linear Models in Matrix Form Suppose we have 3 variables in a multiple regression, with four (y,x) vectors of observations. In matrix form, Vector of parameters to estimate The design matrix X. Details of both the experimental design and the observed values of the predictor variables all reside solely in X

0 1 e 0 1 0 1 1 1 y 1 1 0 0 0 1 1 1 π e B C y 1 1 0 0 1 2 B C B C 1 2 B C B C B C s e B C B C y 1 0 1 0 B C B C 1 2 1 2 1 Ø = ; a n d e = y = ; X = B C B C B C @ A s e y 1 0 0 1 B C B C B C 3 1 2 3 1 @ A @ A @ A y 1 0 0 1 s e 3 2 3 3 2 y 1 0 0 1 e 3 3 3 3 The Sire Model in Matrix Form The model here is yij = m + si + eij Consider three sires. Sire 1 has 2 offspring, sire 2 has one and sire 3 has three. The GLM form is

- n X - - T T 2 e = e e = ( y ° X b ) ( y ° X b ) b b b i i = 1 Predicted value of the y’s - T T ° 1 b = ( X X ) X y - T ° 1 2 V = ( X X ) æ b e Ordinary Least Squares (OLS) When the covariance structure of the residuals has a certain form, we solve for the vector b using OLS If residuals follow a MVN distribution, OLS = ML solution • If the residuals are homoscedastic and uncorrelated, • 2(ei) = se2, s(ei,ej) = 0. Hence, each residual is equally weighted, Sum of squared residuals can be written as Taking (matrix) derivatives shows this is minimized by This is the OLS estimate of the vector b The variance-covariance estimate for the sample estimates is

0 1 0 1 y x 1 1 y B C x B C 2 2 B C Ø = ( Ø ) B C y = . X = . @ A . @ A . . . y x n n n n X X T T 2 X X = x X y = x y i i i i = 1 i = 1 - - P ≥ ¥ ≥ ¥ 2 ) ( ° æ ) ( 1 ° x y 1 i i T e 2 2 T T æ ( b ) = X X æ = P b = X X X y = P e 2 2 x x i i - P 2 1 ( y ° b x ) i i 2 - æ ( b ) = P X 2 n ° 1 x 1 - i 2 2 æ = ( y b x ) i i - e n 1 Example: Regression Through the Origin yi = bxi + ei Here

2 y = Æ + Ø x + Ø x + e i i i 1 2 i 0 1 0 1 2 1 x x 1 1 Æ 2 1 x x B C 2 2 @ A Ø = Ø B C X = . . . 1 @ A . . . . . . Ø 2 2 1 x x n n y = Æ + Ø x + Ø x + Ø x x + e i i i i i i 1 1 2 2 3 1 2 0 1 0 1 1 x x x x Æ 1 1 1 2 1 1 1 2 1 x x x x B C Ø 2 1 2 2 2 1 2 2 B C 1 B C X = Ø = . . . . @ A . . . . @ A Ø . . . . 2 Ø 1 x x x x 3 n n n n 1 2 1 2 Polynomial Regressions and Interaction Effects GLM can easily handle any function of the observed predictor variables, provided the parameters to estimate are still linear, e.g. Y = a +b1f(x) + b1g(x) + … + e Quadratic regression: Interaction terms (e.g. sex x age) are handled similarly With x1 held constant, a unit change in x2 changes y by b2 + b3x1 Likewise, a unit change in x1 changes y by b1 + b3x2

Fixed vs. Random Effects In linear models are are trying to accomplish two goals: estimation the values of model parameters and estimate any appropriate variances. For example in the simplest regression model, y = a + bx + e, we estimate the values for a and b and also the variance of e. We, of course, can also estimate the ei = yi - (a + bxi ) Note that a/b are fixed constants are we trying to estimate (fixed factors or fixed effects), while the ei values are drawn from some probability distribution (typically Normal with mean 0, variance s2e). The ei are random effects.

This distinction between fixed and random effects is extremely important in terms of how we analyzed a model. If a parameter is a fixed constant we wish to estimate, it is a fixed effect. If a parameter are drawn from some probability distribution and we are trying to make inference on either the distribution and/or specific realizations from this distribution, it is a random effect. We generally speak of estimating fixed factors and predicting random effects. “Mixed” models contain both fixed and random factors

Is the sire effect fixed or random? Fixed effect Random effect Example: Sire model yij = m + si + eij It depends. If we have (say) 10 sires, if we are ONLY interested in the values of these particular 10 sires and don’t care to make any other inferences about the population from which the sires are drawn, then we can treat them as fixed effects. In the case, the model is fully specified the covariance structure for the residuals. Thus, we need to estimate m, s1 to s10 and s2e, and we write the model as yij = m + si + eij, s2(eij) = s2e. Conversely, if we are not only interested in these 10 particular sires but also wish to make some inference about the population from which they were drawn (such as the additive variance, since s2A = 4s2s, ), then the si are random effects. In this case we wish to estimate m and the variances s2s and s2w. Since 2si also estimates (or predicts) the breeding value for sire i, we also wish to estimate (predict) these as well. Under a Random-effects interpretation, we write the model as yij = m + si + eij, s2(eij) = s2e, s2(si) = s2s

Generalized Least Squares (GLS) Suppose the residuals no longer have the same variance (i.e., display heteroscedasticity). Clearly we do not wish to minimize the unweighted sum of squared residuals, because those residuals with smaller variance should receive more weight. Likewise in the event the residuals are correlated, we also wish to take this into account (i.e., perform a suitable transformation to remove the correlations)before minimizing the sum of squares. Either of the above settings leads to a GLS solution in place of an OLS solution.

- - ( ( ) ) - - ° - 1 T ° T ° 1 1 b = X R X X R y ≥ ¥ ° 1 T ° 1 2 V = X R X æ b e In the GLS setting, the covariance matrix for the vector e of residuals is written as R where Rij = s(ei,ej) The linear model becomes y = Xb + e, cov(e) = R The GLS solution for b is The variance-covariance of the estimated model parameters is given by

- - - ° ° ° 1 1 1 R = D i a g ( w ; w ; : : : ; w ) n 1 2 0 1 1 x µ ∂ 1 . . Æ @ A . . X = . . Ø = Ø 1 x n Example One common setting is where the residuals are uncorrelated but heteroscedastic, with s2(ei) = s2e/wi. For example, sample i is the mean value of ni individuals, with s2(ei) = se2/ni. Here wi = ni Consider the model yi = a + bxi

- ° 1 R = D i a g ( w ; w ; : : : ; w ) n 1 2 0 1 - 0 1 1 x - w y - T ° 1 w @ A X R X = w T ° 1 @ A X R y = w x x 2 w w x y w n n n X X X 2 w x w x i i i i w = w ; x = ; x 2 = i w w w w i i i = 1 = 1 = 1 n n X X w y w x y i i i i i y = ; x y = w w w w i i = 1 = 1 Here giving where

- a = y ° b x w w - x y ° x y w w w b = - 2 2 x x w w 2 æ ¢ x 2 w e 2 æ ( a ) = - 2 w ( x 2 ° x ) w w 2 æ e 2 æ ( b ) = - 2 w ( x 2 ° x ) w w - 2 ° æ x w e æ ( a ; b ) = - 2 w ( x 2 ° x ) w w This gives the GLS estimators of a and b as ° Likewise, the resulting variances and covariance for these estimators is

n X - ~ 2 - ( x ° x ) ª ¬ i n ° 1 i =1 2 Chi-square and F distributions Let Ui be a unit normal The sum U12 + U22 + … + Un2 is a chi-square random variable with k degrees of freedom Under appropriate normality assumptions, the sums of squares that appear in linear models are also chi-square distributed. In particular, The ratio of two chi-squares is an F distribution

2 ¬ =k ~ k ª F k ;n 2 ¬ =n n In particular, an F distribution with k numerator degrees of freedom, andn denominator degrees of freedom is given by Thus, F distributions frequently arrive in tests of linear models, as these usually involve ratios of sums of squares.

X - 2 ( y ° y ) i b X - 2 ( y ° y ) SSM SSE i i r2 = = 1 - SST SST X ° 2 - - ( y ° y ) b i Sums of Squares in linear models The total sums of squares SST of a linear model can be written as the sum of the error (or residual) sum of squares and the model (or regression) sum of squares SST = SSM + SSE r2, the coefficient of determination, is the fraction of variation accounted for by the model

( √ ! 2 n n n n X X X X 1 - - - 2 2 2 2 SS = ( y ° y ) = y ° y = y ° y T i i i i 2 n i =1 i =1 i =1 i =1 µ ∂ ( ) 1 1 - - T T T SS = y y ° y J y = y I ° J y T n n n n X X - 2 2 SS = ( y ° y ) = e b b E i i i i =1 i =1 - µ ∂ ) ≥ ¥ ( ( ) ) ° 1 - T T T SS = y I ° X X X X y E - µ ∂ ) ≥ ¥ ( ( ) ° 1 1 - - T T T SS = SS ° SS = y X X X X ° J y M T E n Sums of Squares are quadratic products We can write this as a quadratic product as Where J is a matrix all of whose elements are 1’s

SS E ~ 2 - ª ¬ n ° p 2 æ e ( ) - µ ∂ ¡ µ ∂ µ ∂ µ ∂ - SS ° SS SS n ° p SS - E E E E r f f r = ° 1 - - - p ° q n ° p p ° q SS E f Hypothesis testing Provided the residual errors in the model are MVN, then for a model with n observations and p estimated parameters, Consider the comparison of a full (p parameters) and reduced (q < p) models The difference in the error sum of squares for the full and reduced model provided a test for whether the model fit is the same This ratio follows an Fp-q,n-p distribution

) ( µ ∂ µ ∂ µ ∂ µ ∂ - - 2 n ° p SS n ° p r - T * ° 1 = - - - 2 p ° 1 SS p ° 1 1 ° r E f Does our model account for a significant fraction of the variation? Here the reduced model is just yi = u + ei In this case, the error sum of squares for the reduced model is just the total sum of squares and the F test ratio becomes This ratio follows an Fp-1,n-p distribution

Matrix Algebra and Linear Models