1 / 21

# Sampling from a MVN Distribution - PowerPoint PPT Presentation

Sampling from a MVN Distribution. BMTRY 726 1/17/2014. Sample Mean Vector. We can estimate a sample mean for X 1, X 2, …, X n. Sample Mean Vector. Now we can estimate the mean of our sample But what about the properties of ? It is an unbiased estimate of the mean

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Sampling from a MVN Distribution' - chars

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Sampling from a MVN Distribution

BMTRY 726

1/17/2014

• We can estimate a sample mean for X1,X2, …, Xn

• Now we can estimate the mean of our sample

• But what about the properties of ?

• It is an unbiased estimate of the mean

• It is a sufficient statistic

• Also, the sampling distribution is:

• And the sample covariance for X1,X2, …, Xn

• Sample variance

• Sample Covariance

• So we can also estimate the variance of our sample

• And like , S also has some nice properties

• It is an unbiased estimate of the variance

• It is also a sufficient statistic

• It is also independent of

• But what about the sampling distribution of S?

Given , the distribution of is called a Wishart distribution with n degrees of freedom.

has a Wishart distribution with n -1 degrees of freedom

The density function is

where A and S are positive definite

• The Wishart distribution is the multivariate analog of the central chi-squared distribution.

• If are independent then

• If then CAC’ is distributed

• The distribution of the (i, i) element of A is

• Let X1,X2, …, Xnbe a random sample from a population with mean and variance (not necessarily normally distributed)

Then and Sare consistentestimators for m and S. This means

• If we have a random sample X1,X2, …, Xna population with mean and variance, we can apply the multivariate central limit theorem as well

• The multivariate CLT says

• Check univariate normality for each component of X

• Normal probability plots (i.e. Q-Q plots)

• Tests:

• Shapiro-Wilk

• Correlation

• EDF

• Check bivariate (and higher)

• Bivariate scatter plots

• Chi-square probability plots

• If X1, X2,…, Xn are a random sample from a p-dimensional normal population, then the data for the ith trait are a random sample from a univariate normal distribution (from result 4.2)

• -Q-Q plot

• Order the data

• Compute the quantiles according to

• Plot the pairs of observations

• Shapiro-Wilk test

• Alternative is a modified version of Shapiro-Wilk test

• Uses correlation coefficient from the Q-Q plot

• Reject normality if rQ is too small (values in Table 4.2)

• Anderson-Darling and Kolmogrov-Smirnov statistics measure how much the empirical distribution function (EDF)

differs from the hypothesized distribution

• For a univariate normal distribution

• Large values for either statistic indicate observed data were not sampled from the hypothesized distribution

• You can generate bivariate plots of all pairs of traits and look for unusual observations

• A chi-square plot checks for normality in p> 2 dimensions

• For each observation compute

• Order these values from smallest to largest

• Calculate quantiles for the chi-squared distribution with p d.f.

• Plot the pairs

Do the points deviate too much from a straight line?

Apply normal based procedures anyway

Hope for the best….

Resampling procedures

Try to identify an more appropriate multivariate distribution

Nonparametric methods

Transformations

Check for outliers

• The idea of transformations is to re-express the data to make it more normal looking

• Choosing a suitable transformation can be guided by

• Theoretical considerations

• Count data can often be made to look more normal by using a square root transformation

• The data themselves

• If the choice is not particularly clear consider power transformations

• Commonly use but note, defined only for positive variables

• Defined by a parameter l as follows:

• So what do we use?

• Right skewed data consider l< 1 (fractions, 0, negative numbers…)

• Left skewed data consider l> 1

• Box-Cox are a popular modification of power transformations where

• Box-Cox transformations determine the best l by maximizing:

• Note, in the multivariate setting, this would be considered for every trait

• However… normality of each individual trait does not guarantee joint normality

• We could iteratively try to search for the best transformations for joint and marginal normality

• May not really improve our results substantially

• And often univariate transformations are good enough in practice

• Be very cautious about rejecting normality

• Examples of normality checks in SAS and R

• Begin our discussion of statistical inference for MV vectors