1 / 21

Sampling from a MVN Distribution

Sampling from a MVN Distribution. BMTRY 726 1/17/2014. Sample Mean Vector. We can estimate a sample mean for X 1, X 2, …, X n. Sample Mean Vector. Now we can estimate the mean of our sample But what about the properties of ? It is an unbiased estimate of the mean

chars
Download Presentation

Sampling from a MVN Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling from a MVN Distribution BMTRY 726 1/17/2014

  2. Sample Mean Vector • We can estimate a sample mean for X1,X2, …, Xn

  3. Sample Mean Vector • Now we can estimate the mean of our sample • But what about the properties of ? • It is an unbiased estimate of the mean • It is a sufficient statistic • Also, the sampling distribution is:

  4. Sample Covariance • And the sample covariance for X1,X2, …, Xn • Sample variance • Sample Covariance

  5. Sample Mean Vector • So we can also estimate the variance of our sample • And like , S also has some nice properties • It is an unbiased estimate of the variance • It is also a sufficient statistic • It is also independent of • But what about the sampling distribution of S?

  6. Wishart Distribution Given , the distribution of is called a Wishart distribution with n degrees of freedom. has a Wishart distribution with n -1 degrees of freedom The density function is where A and S are positive definite

  7. Wishart cont’d • The Wishart distribution is the multivariate analog of the central chi-squared distribution. • If are independent then • If then CAC’ is distributed • The distribution of the (i, i) element of A is

  8. Large Sample Behavior • Let X1,X2, …, Xnbe a random sample from a population with mean and variance (not necessarily normally distributed) Then and Sare consistentestimators for m and S. This means

  9. Large Sample Behavior • If we have a random sample X1,X2, …, Xna population with mean and variance, we can apply the multivariate central limit theorem as well • The multivariate CLT says

  10. Checking Normality Assumptions • Check univariate normality for each component of X • Normal probability plots (i.e. Q-Q plots) • Tests: • Shapiro-Wilk • Correlation • EDF • Check bivariate (and higher) • Bivariate scatter plots • Chi-square probability plots

  11. Univariate Methods • If X1, X2,…, Xn are a random sample from a p-dimensional normal population, then the data for the ith trait are a random sample from a univariate normal distribution (from result 4.2) • -Q-Q plot • Order the data • Compute the quantiles according to • Plot the pairs of observations

  12. Correlation Tests • Shapiro-Wilk test • Alternative is a modified version of Shapiro-Wilk test • Uses correlation coefficient from the Q-Q plot • Reject normality if rQ is too small (values in Table 4.2)

  13. Empirical Distribution Tests • Anderson-Darling and Kolmogrov-Smirnov statistics measure how much the empirical distribution function (EDF) differs from the hypothesized distribution • For a univariate normal distribution • Large values for either statistic indicate observed data were not sampled from the hypothesized distribution

  14. Multivariate Methods • You can generate bivariate plots of all pairs of traits and look for unusual observations • A chi-square plot checks for normality in p> 2 dimensions • For each observation compute • Order these values from smallest to largest • Calculate quantiles for the chi-squared distribution with p d.f.

  15. Multivariate Methods • Plot the pairs Do the points deviate too much from a straight line?

  16. Things to Do with non-MVN Data Apply normal based procedures anyway Hope for the best…. Resampling procedures Try to identify an more appropriate multivariate distribution Nonparametric methods Transformations Check for outliers

  17. Transformations • The idea of transformations is to re-express the data to make it more normal looking • Choosing a suitable transformation can be guided by • Theoretical considerations • Count data can often be made to look more normal by using a square root transformation • The data themselves • If the choice is not particularly clear consider power transformations

  18. Power Transformations • Commonly use but note, defined only for positive variables • Defined by a parameter l as follows: • So what do we use? • Right skewed data consider l< 1 (fractions, 0, negative numbers…) • Left skewed data consider l> 1

  19. Power Transformations • Box-Cox are a popular modification of power transformations where • Box-Cox transformations determine the best l by maximizing:

  20. Transformations • Note, in the multivariate setting, this would be considered for every trait • However… normality of each individual trait does not guarantee joint normality • We could iteratively try to search for the best transformations for joint and marginal normality • May not really improve our results substantially • And often univariate transformations are good enough in practice • Be very cautious about rejecting normality

  21. Next Time • Examples of normality checks in SAS and R • Begin our discussion of statistical inference for MV vectors

More Related