340 likes | 360 Views
Basic Results in Probability and Statistics. KNNL – Appendix A. A.1 Summation and Product Operators. A.2 Probability. A.3 Random Variables ( Univariate ). A.3 Random Variables (Bivariate). A.3 Covariance, Correlation, Independence. Linear Functions of RVs. Linear Functions of RVs.
 
                
                E N D
Basic Results in Probability and Statistics KNNL – Appendix A
Central Limit Theorem • When random samples of size n are selected from any population with mean m and finite variance s2, the sampling distribution of the sample mean will be approximately normally distributed for large n: Z-table can be used to approximate probabilities of ranges of values for sample means, as well as percentiles of their sampling distribution
Normal (Gaussian) Distribution • Bell-shaped distribution with tendency for individuals to clump around the group median/mean • Used to model many biological phenomena • Many estimators have approximate normal sampling distributions (see Central Limit Theorem) • Notation: Y~N(m,s2) where m is mean and s2 is variance Obtaining Probabilities in EXCEL: To obtain: F(y)=P(Y≤y) Use Function: =NORMDIST(y,m,s,1) Table B.1 (p. 1316) gives the cdf for standardized normal random variables: z=(y-m)/s ~ N(0,1) for values of z ≥ 0 (obtain tail probabilities by complements and symmetry)
Second Decimal Place of z Integer part and first decimal place of z
Chi-Square Distribution • Indexed by “degrees of freedom (n)” X~cn2 • Z~N(0,1)  Z2 ~c12 • Assuming Independence: Obtaining Probabilities in EXCEL: To obtain: 1-F(x)=P(X≥x) Use Function: =CHIDIST(x,n) Table B.3, p. 1319 Gives percentiles of c2 distributions: P{c2(n) ≤ c2(A;n)} = A
Critical Values for Chi-Square Distributions (Mean=n, Variance=2n)
Student’s t-Distribution • Indexed by “degrees of freedom (n)” X~tn • Z~N(0,1), X~cn2 • Assuming Independence of Z and X: Obtaining Probabilities in EXCEL:To obtain: 1-F(t)=P(T≥t) Use Function: =TDIST(t,n) Table B.2 pp. 1317-1318 gives percentiles of the t-distribution: P{t(n) ≤ t(A;n)} = A for A > 0.5 for A < 0.5: P{t(n) ≤ -t(A;n)} = 1-A
Critical Values for Student’s t-Distributions (Mean=n, Variance=2n)
F-Distribution • Indexed by 2 “degrees of freedom (n1,n2)” W~Fn1,n2 • X1 ~cn12, X2 ~cn22 • Assuming Independence of X1 and X2: Obtaining Probabilities in EXCEL: To obtain: 1-F(w)=P(W≥w) Use Function: =FDIST(w,n1,n2) Table B.4 pp.1320-1326 gives percentiles of F-distribution: P{F(n1,n2) ≤ F(A;n1,n2)} = A For values of A > 0.5 For values of A < 0.5 (lower tail probabilities): F(A;n1,n2) = 1/ F(A;n1,n2)
Critical Values for F-distributions P(F ≤ Table Value) = 0.95
A.5 Statistical Estimation - Properties Note: If an estimator is unbiased (easy to show) and its variance goes to zero as its sample size gets infinitely large (easy to show), it is consistent. It is tougher to show that it is Minimum Variance, but general results have been obtained in many standard cases.
One-Sample Confidence Interval for m • SRS from a population with mean m is obtained. • Sample mean, sample standard deviation are obtained • Degrees of freedom are df= n-1, and confidence level (1-a) are selected • Level (1-a) confidence interval of form: Procedure is theoretically derived based on normally distributed data, but has been found to work well regardless for moderate to large n
1-Sample t-test (2-tailed alternative) • 2-sided Test: H0: m = m0Ha: mm0 • Decision Rule : • Conclude m>m0 if Test Statistic (t*) > t(1-a/2;n-1) • Conclude m<m0 if Test Statistic (t*) <- t(1-a/2;n-1) • Do not conclude Conclude mm0 otherwise • P-value: 2P(t(n-1) |t*|) • Test Statistic: See Table A.1, p. 1307 for decision rules on 1-sided tests
Comparing 2 Means - Independent Samples • Observed individuals from the 2 groups are samples from distinct populations (identified by (m1,s12) and (m2,s22)) • Measurements across groups are independent • Summary statistics obtained from the 2 groups:
Sampling Distribution of • Underlying distributions normal  sampling distribution is normal, and resulting t-distribution with estimated std. dev. • Mean, variance, standard error (Std. Dev. of estimator)
Inference for m1-m2 - Normal Populations – Equal variances • Interpretation (at the a significance level): • If interval contains 0, do not reject H0: m1 = m2 • If interval is strictly positive, conclude that m1 > m2 • If interval is strictly negative, conclude that m1 < m2
Sampling Distribution of s2 (Normal Data) • Population variance (s2) is a fixed (unknown) parameter based on the population of measurements • Sample variance (s2) varies from sample to sample (just as sample mean does) • When Y~N(m,s2), the distribution of (a multiple of) s2 is Chi-Square with n-1 degrees of freedom. • (n-1)s2/s2 ~ c2 with df=n-1
(1-a)100% Confidence Interval for s2 (or s) • Step 1: Obtain a random sample of n items from the population, compute s2 • Step 2: Obtain c2L = and c2U from table of critical values for chi-square distribution with n-1 df • Step 3: Compute the confidence interval for s2 based on the formula below and take square roots of bounds for s2 to obtain confidence interval for s
Statistical Test for s2 • Null and alternative hypotheses • 1-sided (upper tail): H0: s2 s02Ha: s2> s02 • 1-sided (lower tail): H0: s2 s02Ha: s2< s02 • 2-sided: H0: s2= s02Ha: s2 s02 • Test Statistic • Decision Rule based on chi-square distribution w/ df=n-1: • 1-sided (upper tail): Reject H0 if cobs2 > cU2 = c2(1-a;n-1) • 1-sided (lower tail): Reject H0 if cobs2 < cL2 = c2(a;n-1) • 2-sided: Reject H0 if cobs2 < cL2 = c2(a/2;n-1)(Conclude s2< s02) or if cobs2 > cU2 = c2(1-a/2;n-1) (Conclude s2> s02)
Inferences Regarding 2 Population Variances • Goal: Compare variances between 2 populations • Parameter: (Ratio is 1 when variances are equal) • Estimator: (Ratio of sample variances) • Distribution of (multiple) of estimator (Normal Data): F-distribution with parameters df1 = n1-1 and df2 = n2-1
Test Comparing Two Population Variances • Assumption: the 2 populations are normally distributed
(1-a)100% Confidence Interval for s12/s22 • Obtain ratio of sample variances s12/s22 = (s1/s2)2 • Choose a, and obtain: • FL = F(a/2, n1-1, n2-1) = 1/ F(1-a/2, n2-1, n1-1) • FU = F(1-a/2, n1-1, n2-1) • Compute Confidence Interval: Conclude population variances unequal if interval does not contain 1