Design of Statistical Investigations

1 / 17

# Design of Statistical Investigations - PowerPoint PPT Presentation

Design of Statistical Investigations. 2 Background Stats. Stephen Senn. linear combinations :

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Design of Statistical Investigations' - abdul-barnes

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Design of Statistical Investigations

2 Background Stats

Stephen Senn

SJS SDI_2

linear combinations:

1) If Xi is a random variable with expected value E[Xi]= i and variance V[Xi] = i2 and a and b are two constants, then E[a + bXi] = a + bi and V[a + bXi] = b2i2.

2) If Xiand Xjare two random variables, then E[aXi + bXj] = ai + bi and V[aXi + bXj] = a2i2 + b2j2 + 2abij, where ij = E[(Xi - i)(Xj - j)] is known as the covariance of Xi and Xj.

3) If X1, X2,..Xnare nindependentrandom variables, with expectations, nand variancesn, respectively, then aiXi has expectation aii and variance ai2i2.

SJS SDI_2

Expected value of a corrected sum of squares

If X1, X2, ......Xn is a random sample of size n from a population with variance 2, then

is known as the corrected sum of squares and has expected value (n - 1)2.

NB The factor (n - 1), known as the degrees of freedom, arises because the correction point (in this case the sample mean) is estimated from the data. In general we lose one degree of freedom for every constant fitted.

SJS SDI_2

Distribution of a corrected sum of squares

If a corrected sum of squares, CSS, with  degrees of freedom is calculated from a random sample from a Normal distribution with variance 2, then CSS/2 has a chi-square distribution with  degrees of freedom.

chi-square statistics

If Y1 has a chi-square distribution with 1 degrees of freedom and Y2 is independently distributed as a chi-square with 2 degrees of freedom then Y = Y1 + Y2 has a chi-square with 1 + 2 degrees of freedom

SJS SDI_2

.

t-statistics

If Z is a random variable which is Normally distributed with mean 0 and variance 1 and Y is independently distributed as a chi-square with  degrees of freedom, then

t = Z/(Y/)

has a t distribution with  degrees of freedom.

SJS SDI_2

Further Variate Relations

The square of a t is distributed F1,

The square of a Normal (0,1) is distributed 21

The sum of a series of Normally distributed random variables is itself Normally distributed with mean and variance given by the rule for linear combinations.

The ratio of two independent random chi-square variables , each divided by its degrees of freedom is an F r.v. with corresponding degrees of freedom. (If the numerator chi-square has d.f.  and the denominator has d.f.  then the resulting r.v. is F , )

SJS SDI_2

Regression

A model of the form

Yi = 0 + 1X1i + 2X2i + ...Xki +i

i = 1...n

where Yi is a response measured on the ith individual (for example patient i), X1i, X2i etc are measurements of linear predictors (covariates) for the ith individual and i is a stochastic disturbance term, is known as a generallinear model and may be expressed in matrix form

Y = X + 

where

Y is an n x 1 vector of responses

SJS SDI_2

X is an n x (k + 1) matrix of predictors consisting of k + 1 column vectors of length n, where the first column vector has all n elements = 1 and the next k columns represent the linear predictors X1 to Xk.

is an n x 1 vector of disturbance terms with E()= 0 usually assumed independent and of constant variance 2, so that

E2 I, where I is an n x n identity matrix.

The Ordinary Least Squares (OLS) estimator of is

and its variance (variance covariance matrix) is

SJS SDI_2

If the further assumption is made that the i terms are Normally distributed then b has a multivariate Normal distribution and individual elements of b are Normally distributed with variance identifiable from (1.2).

In practice, 2will be unknown but has unbiased estimate

s2 = eTe/(n - k -1)

where e = Y - Xb and is the vector of residuals from the fitted model.

The ratio of bj to ajjs has a t-distribution with n - k - 1 degrees of freedom, where bj is the jth element of b and ajj is the jth diagonal element of A.

This fact may be used to test hypotheses about any element of  and to construct confidence intervals for it.

SJS SDI_2

The Bivariate Normal

The bivariate Normal first received extensive application in statistical analysis in the work of Francis Galton (1822-1911) who was a UCL man! These are some brief notes about some mathematical aspects of it.

If the joint probability density function of two random variables is given by

(1.3)

where ,

then X and Y are said to have a bivariate Normal distribution

SJS SDI_2

are parameters of the distribution.

Since (1.1) is a p.d.f then

A contour plot of a bivariate Normal with

is given on the next slide.

SJS SDI_2

Contour plot

SJS SDI_2

If we integrate out Y, we obtain the marginal distribution of X and this is, in fact a Normal with mean and variance

Thus

(1.4)

and similarly by integrating out X we obtain the marginal distribution of Y, which is also a Normal distribution

(1.5)

SJS SDI_2

From (1.4) and (1.5) we see that are respectively the mean of X and Y and the variance of X and Y. The parameter is known as the correlation coefficient and was studied extensively by Galton.

We are often interested in the conditional distribution of Y given X and vice versa. These also turn out to be Normal distributions. In fact we have

(1.6)

where

(1.7)

(1.8)

and

and, of course, an analogous expression exists for the conditional distribution of X given Y exchanging Y for X and vice versa.

SJS SDI_2

and slope . Given a bivariate Normal, for particular values of X we will

find that the average value of Y lies on this straight line. The degree of scatter about this line is constant and is given by (1.8) .

SJS SDI_2