1 / 24

DATA ANALYSIS

DATA ANALYSIS. Module Code: CA660 Lecture Block 5. ESTIMATION – Rationale , Summary & Other Features. Estimator validity – how good ? Basis statistical properties (variance, bias, distributional etc.)

zeroun
Download Presentation

DATA ANALYSIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA ANALYSIS Module Code: CA660 Lecture Block 5

  2. ESTIMATION – Rationale , Summary & Other Features • Estimator validity – how good? • Basisstatistical properties (variance, bias, distributional etc.) • Bias where is the point estimate,  the true parameter. Bias can be positive, negative or zero. Permits calculation of other properties, e.g. where this quantity and variance of estimator are the same if estimator is unbiased. Obtained by both analytical and“bootstrap methods” Similarly, for continuous variables or for b bootstrap replications,

  3. Estimation Rationale etc. - contd. • For any,estimator , even unbiased,there is a difference between estimator and true parameter = sampling error Hence the need for probability statements around with C.L. for estimator = (T1 , T2), similarly to before and  the confidence coefficient. If the estimator is unbiased, in other words,  is the probability that the true parameter falls into the interval. • In general, confidence intervals can be determined using parametric and non-parametric approaches, where parametric construction needs a pivotal quantity = variable which is a function of parameter and data, but whose distribution does not depend on the parameter.

  4. HYPOTHESIS TESTING - Rationale • Starting Point of scientific research e.g. No Genetic Linkage between the genetic markers and genes when we design a linkage mapping experiment H0 :  = 0.5 (No Linkage) (2-locus linkage experiment) H1 :   0.5 (two loci linked with specified R.F. = 0.2, say) • Critical Region Given a cumulative probability distribution fn. of a test statistic, F(x) say, the critical region for the hypothesis test is the region of rejection in the distribution, i.e. the area under the probability curve where the observed test statistic value is unlikely to be observed if H0 true.  = significance level

  5. HT: Critical Regions and Symmetry • For a symmetric 2-tailed hypothesis test: or distinction = uni- or bi-directional alternative hypotheses • Non-Symmetric, 2-tailed • Fora=0 or b=0, reduces to 1-tailed case

  6. HT-Critical Values and Significance • Cut-off valuesfor Rejection and Acceptance regions = Critical Values, so hypothesis test can be interpreted as comparison between critical values and observed hypothesis test statistic, i.e. • Significance Level : p-value is the probability of observing a sample outcome if H0true is cum. prob. that expected value less than observed test statistic for data under H0. For p-value less than or equal to,H0 rejected at significance level  and below.

  7. Related issues in Hypothesis Testing -POWER of the TEST • Probability of False Positive and False Negative errors e.g.false positive if linkage between two genes declared, when really independent Hypothesis Test Result Fact Accept H0Reject H0 H0 True1- False positive = Type I error = H0 False False negative Power of the Test =Type II error= = 1-  • Power of the TestorStatistical Power=probability of rejectingH0when correct to do so. (Related strictly to alternative hypothesis and )

  8. Example on Type II Error and Power • Suppose have a variable, with known population S.D. = 3.6. From the population, a r.s. size n=100, used to test at =0.05, that • critical values of for a 2-sided test are: for =0.05 where for , i = upper or lower and 0  under H0 • So substituting our values gives: • But, if H0false,  is not 17.5, but some other value e.g. 16.5 say ??

  9. Example contd. • Want new distributionwith mean  = 16.5, i.e. new distribution is shifted w.r.t. the old. • Thus the probability of the Type II error - failingtoreject false H0 is the area under the curve in the new distribution which overlaps thenon-rejectionregion specified under H0 • So, this is • Thus, probability of taking the appropriate action (rejectingH0 when this is false) is 0.791 = Power

  10. Shifting the distributions Non-Rejection region Rejection region /2 Rejection region /2 16.79 17.5 18.21 16.5 

  11. Example contd. - Power under alternative  for given  Possible values of  1- under H1for H0 false 16.0 0.0143 0.9857 16.5 0.2090 0.7910 17.0 0.7190 0.2810 18.0 0.7190 0.2810 18.5 0.2090 0.7910 19.0 0.0143 0.9857 • Balancing and :  tends to be largec.f.  unless original hypothesis way off. So decision based on a rejectedH0more conclusive than one based on H0not rejected, as probability wrong is larger in the latter case.

  12. SAMPLE SIZE DETERMINATION • Example: Suppose wanted to design a genetic mapping experiment, usually mating design (or population mapping type): so conventional experimental design - ANOVA), genetic marker type and sample size considered. Questions might include: What is the statistical power to detect linkage for certain progeny size? What is the precision of estimated R.F. when sample size is N? • Sample size needed for specificStatistical Power • Sample size needed for specific Confidence Interval

  13. Sample size - calculation based on C.I. For some parameter for normal approximation approach valid, C.I. is U =standardized normal deviate (S.N.D.) and range from lower to upper limits is just a precision measurement for the estimator Given a true parameter , So manipulation gives:

  14. Sample size - calculation based on Power (firstly, what affects power)? • Suppose  = 0.05, =3.5, n=100, testing H0: 0=25 when true =24; assume H1:1 <25 - One-tailed test (U = 1.645):sample mean under H0 = 24.45: Under H1Power = 0.50+0.39 = 0.89 Note:Two-sided test at  = 0.05 gives critical values with under H0: equivalently UL=+0.89,Uu = 4.82 for H1 (i.e.substitute for limits in equn. & then recalculate for new  = 1 = 24) So, P{do not rejectH0: =25 when true mean =24} = 0.1867 =  (Type II) Thus, Power= 1 - 0.1867 = 0.8133

  15. Sample Size and Power contd. • Suppose, n=25, other values same. 1-tailed now Power = 0.4129 • Suppose  = 0.01, critical values 2-tailed with, equivalently,UL = +0.29, UU = +5.43 So, P{do not rejectH0: =25 when true mean =24} = 0.1141 Power =0.8859 FACTORS: , n and type of test (1- or 2-sided), true parameter value where subscripts 0 and 1 refer to null and alternative, and  value may refer to 1-sided value or 2-sided value

  16. ‘Other’ ways to estimate/test NON-PARAMETRICS/DISTN FREE • Standard Pdfs-do notapply to data, sampling distributions or test statistics – uncertain due to small or unreliable data sets, non-independence etc. Parameter estimation - not key issue. • Example/ Empirical-basis. Weaker assumptions. Less ‘information’ e.g. median used. Simple hypothesis testing as opposed to estimation. Power and efficiency are issues. Counts - nominal, ordinal (natural non-parametric data type). • Nonparametric Hypothesis Tests- (parallels to the parametric case). e.g. H.T. of locus orders requires complex test statistic distributions, so need to construct empirical pdf. Usually, assume the null hypothesis using re-sampling techniques, e.g. permutation tests, bootstrap, jacknife.

  17. LIKELIHOOD - DEFINITIONS • Suppose X can take a set of values x1,x2,…with where is a vector of parameters affecting observed x’s • e.g. . So can say something about P{X} if we know, say, • Butnot usually case, i.e. observe x’s, knowing nothing of • Assuming x’s a random sample size n from a knowndistribution, then likelihood for • Finding most likely for given data is equivalent to Maximising the Likelihood function, (where M.L.E. is )

  18. LIKELIHOOD –SCORE and INFO. CONTENT • The Log-likelihood is a supportfunction[S()] evaluated at point,  ´ say • Support function for any other point, say  ´´ can also be obtained – basis for computational iterations for MLE e.g. Newton-Raphson • SCORE = first derivative of support function w.r.t. the parameter or, numerically/discretely, • INFORMATION CONTENT evaluated at (i) arbitrary point =Observed Info. (ii)support function maximum = Expected Info.

  19. Example - Binomial variable(e.g. use of Score, Expected Info. Content to determine type of mapping population and sample size for genomics experiments) Likelihood function Log-likelihood Assume n constant, so first term invariant w.r.t. p = S( ) at point p Maximisingw.r.t.p i.e. set the derivative of S w.r.t. = 0 so SCOREsoM.L.E. How does it work, why bother?

  20. Numerical Examples: See some data sets and test examples: Simple: http://warnercnr.colostate.edu/class_info/fw663/BinomialLikelihood.PDF Context: http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html All sections useful, but especially examples, sections 1-3 and 6 Also, e.g. for R http://www.montana.edu/rotella/502/binom_like.pdf

  21. Bayesian Estimation- in context • Parametric Estimation- in “classical approach” f(x,) for a r.v. X of density f(x) , with  the unknown parameter  dependency of distribution on parameter to be estimated. • Bayesian Estimation- is a random variable, so can consider the density as conditional and write f(x|  ) Given a random sampleX1,X2,… Xnthe sample random variables are jointly distributed with parameter r.v.. So, joint pdf • Objective - to form an estimator that gives value of  , dependent on observations of the sample random variables. Thus conditional density of  given X1,X2,… Xn also plays a role. This is the posterior density

  22. Bayes - contd. • Posterior Density • Relationship - prior and posterior: where () prior density of  • Value: Close to MLE for large n, or for small n if sample values compatible with the prior distribution. Also, has strong sample basis, simpler to calculate than M.L.E.

  23. Estimator Comparison in brief. • Classical: uses objective probabilities, intuitive estimators, additional assumptions for sampling distributions: good properties for some estimators. • Moment:less calculation, less efficient. Not common in genomic analysis despite analytical solutions & low bias, because poorer asymptotic properties; even simple solutions may not be unique. • Bayesian - subjective prior knowledge, sample info. close to MLE under certain conditions - see earlier. • LSE- if assumptions met,  ’s unbiased + variances obtained, {(XTX)-1} . Few assumptions for response variable distributions, just expectations, variance-covariance structure. (Unlike MLE where need to specify joint prob. distribution of variables). Requires additional assumptions for sampling distns. Close if assumptions met. Computation easier.

  24. Addendum: MGF – estimation use • Mostly for functions of Random variables. • Normal : reproductive property ( and Poisson and 2 ) actually has • Importantly, for X1, X2 ,…Xn independent r.v.’s with normal distributions with means and variances Then r.v.is ~ N with

More Related