1 / 39

Session III. Introduction to Probability and Testing for Goodness of Fit:

Session III. Introduction to Probability and Testing for Goodness of Fit: Or An IDEA about Probability and Testing! (Zar: Chapters 5, 22). What is Probability?. (1) Not defined much like a point was not defined in geometry; but -------. (2) Probability is a measure of the “chance”

Download Presentation

Session III. Introduction to Probability and Testing for Goodness of Fit:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session III. Introduction to Probability and Testing for Goodness of Fit: Or An IDEA about Probability and Testing! (Zar: Chapters 5, 22)

  2. What is Probability? • (1) Not defined • much like a point was not defined in geometry; but ------- • (2) Probability is a measure of the “chance” • that an “event” will happen. • (a) “What’s the probability it will rain?” Subjective • (b) “What’s the probability that • the coin will be ‘heads’ when I flip it?” Objective • (3) Measured between 0 - 100% or between 0-1.

  3. Estimate of P(A) is P(A) If “A” is the event, then P(A) is the probability of A. How do you get a Value for P(A)?

  4. What are “Odds”?

  5. A B Events Disjoint or Mutually Exclusive Universe • Independent • When one event does not effect another Ex: Coin flips Random selection from an infinite set or selection with replacement from a finite set.

  6. Dependent When one event effects another Ex: Checker flips Random selection from a finite set Ex: Colored balls in a bag • Joint outcomes • Independent: Multiply!

  7. What is the chance of each outcome? • If Mutually Exclusive? Add! • 1st & 2nd toss are independent! • The outcomes (H,H), (H,T), (T,H), and (T,T) are • mutuallyexclusive!

  8. A Simple Hypothesis The “Binomial” -- Any experiment with just two outcomes EX: Flower color Suppose only yellow or green flowers. (Y Y) (g g) 1st cross: (Y g) (Y g) 2nd cross: (Y Y) (Y g) (g Y) (g g )

  9. Probabiliy: 3/4 1/4 Probability: 1/4 1/4 1/4 1/4 Hypothesis: Yellow is Dominant Result: Y Y Y g

  10. Under H0: Expected: 75 25 (3/4100) (1/4100) THE EXPERIMENT: Select 100 flowers at random: Yellow Green Result: 84 16 The Problem: Is 84,16 consistent with the hypothesis? Does 84,16 support a probability of 75%, 25%?

  11. Answer:In the form of a question: • What’s the probability that 84,16 • could come from a true population • of 75%, 25%? Ho: p =75% The Binomial Distribution:

  12. The Binomial Distribution: Take in Slow! • n =1 • P(Y) = .75 P(g) =.25 n = 2 P(YY)=P(Y)P(Y) = .75 .75 = .5625 P(Yg) = P(g Y) = P(Y)  P(g) = .75 .25 = .1875 P(gg) = P(g)  P(g) = .25 .25 = .0625 BUT….. P(YY) + P(Yg) + P(gg) = .8125 ≠1

  13. What’s wrong? • We have the possibility of both Yg + gY)! • Should be • P(Y Y) + P (Y g) + P (g Y) + P (g g)  1 • Which is • # Probability • (1) P(YY) = .252 • (2) P(Yg) = P(gY) = .75 x .25 • (1) P(gg) = .252 Ex: P(at least one Y) = P(YY) + P(gY) + P(Yg) = .75

  14. The Binomial distribution (cont.) • n = 3 • (1) P (Y Y Y) = .753 • (3) P (Y Y g) = P(YgY) + P(gYY) = .752  .25 • (3) P(Y g g) = P(g Y g) = P(g g Y) = .75  .252 • (1) P(g g g ) = .253

  15. x n-x • In general: for x= 0, 1, …, n • P(xY’s, (n-x)g’s) = #ways (xY’s; (n-x)g’s)P(Y Y … Y g g … g) Where

  16. By Pascal’s Triangle: 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 1 6 15 20 15 6 1 1 7 21 35 35 21 7 1

  17. Calculate the Expectation (mean) of a binomial:

  18. For n=100: x P(x g’s) P(≤ x g’s) 0 0.00000000 0.00000000 1 0.00000000 0.00000000 2 0.00000000 0.00000000 3 0.00000000 0.00000000 4 0.00000002 0.00000002 5 0.00000010 0.00000012 6 0.00000052 0.00000064 7 0.00000235 0.00000299 8 0.00000910 0.00001209 9 0.00003100 0.00004308 10 0.00009402 0.00013710 11 0.00025642 0.00039352 12 0.00063392 0.00102744 13 0.00143038 0.00245782 14 0.00296294 0.00542076 15 0.00566251 0.01108327 16 0.01002735 0.02111062 17 0.01651564 0.03762627 18 0.02538515 0.06301142 19 0.03651899 0.09953041 20 0.04930064 0.14883105 21 0.06260399 0.21143505 22 0.07493508 0.28637013 23 0.08470922 0.37107936 24 0.09059180 0.46167117 25 0.09179969 0.55347085 26 0.08826894 0.64173979 27 0.08064076 0.72238052 28 0.07008065 0.79246116 29 0.05799779 0.85045892 30 0.04575381 0.89621270 31 0.03443835 0.93065107 32 0.02475256 0.95540363 33 0.01700176 0.97240537

  19. Conclusions: • Chance of (84,16) = chance of getting (84,16) • or anything “rarer” than (84,16) • = P(84,16) +P(85,15) • + P(86,14) +…+P(100,0) • = .0211 What is rare enough? Biomedical Convention: .05 or 5% RULE: If the experiment is rarer than the cutoff level, say that the experiment is not consistent with the hypothesis! If less rare, say it is consistent!

  20. Other Cutoffs: Lower: .01 or 1% (1/100) for situations needing a lower error rate .001 or .1% (1/1000) Example: the Bruston Explosive Bolt. Or higher: • Example: Physiological Studies • Example: Secondary mets. in pediatric Leukemia study Conclusion: No one cutoff works in every situation. The cutoff should be set before hand to avoid bias.

  21. Value in Universe Statistic Under the Ho What is the cutoff? What is the p=value? Chance [experiment statistic ≤ cutoff] or Pr[X ≤ x | Ho] ≤ cutoff probability If the probability statement is true, then decide that the experiment is not consistent with the hypothesis. But there’s still a chance the experiment came from the Ho!

  22. given Type I error = a = PR[decide ~ Ho | Ho is true] Type II error = ß = Pr[decide Ho | Ho is not true] Three numbers:a, b, cutoff If you have one you have all

  23. Summary of the Binomial • Density function: • Distribution function: Mean: Variance:

  24. Another Way to look at flowers: H0 : Yellow is dominant HA : Yellow is not dominant Y g Total Observed: 84 16 100 Expected percent: 75% 25% Expected number: 75 25 100 (n*proportion) chi-square: (84-75)2 (16-25)2 75 25

  25. Degrees of freedom=#terms-1 84, 16 Example: Degrees of freedom = 2-1 =1 So how extreme is 4.32?

  26. Cutoff Support H0 Reject H0

  27. Table B.1 p-value x 0.01 6.635 0.025 5.024 0.05 3.841 4.32

  28. Table B.1 DF p-value x 1 0.05 3.841 4 0.05 9.488 10 0.05 18.307 20 0.05 31.410

  29. Another Example: More than 2 groups. Color: Green & Yellow Texture: Smooth & Wrinkled Hypothesis: Y is dominant S is dominant Color and Texture are independent Pr(any cell)=1/16

  30. YS Yw gS gw Total • Ho : 9 : 3 : 3 : 1 16 Obs: 152 39 53 6 250 Pr(H0): 9/16 3/16 3/16 1/16 (0.5625) (0.1875) (0.1875) (0.0625) Expected: 140.625 46.875 46.875 15.625 c2:(152-140.625)2 (39-46.875)2(53-46.875)2 (6-15.625)2 140.625 46.875 46.875 15.625 0.9201 + 1.3230 + 0.8003 + 5.929 = 8.97 DF: 1 + 1 + 1 + 1 - 1 = 3

  31. So, where’s the Difference or Subdividing the H0 (1) too few gw (2) about the right # of the others combine YS + Yw + gS and compare to gw. But first, test (2): • H0 : YS, Yw, gS in 9 : 3 : 3 • Total • Obs: 152 39 53 244 • Pr(H0 ) : 9/15 = .6 3/15 = .2 3/15 = .2 • Exp: 146.4 48.8 48.8 • c2 : .2142 + 1.968 + 0.3615 = 2.544 • D.F = 1 + 1 + 1 - 1= 2 • c205 (2) = 5.991 Accept H0

  32. H0 : Others vs gw • 15 : 1 Total • Obs: 244 6 250 • Pr(H0): 15 = .9375 1 = .0625 • 16 16 • Expected: 234.375 15.625 • c2: 0.3953 + 5.929 = 6.324 • DF : 1 + 1 -1 = 1 • c2.05 (1) = 3.841  Reject H0 and accept “Too few gw”

  33. Summary: • DF = k-1

  34. (2) Rule of Thumb to use c2 instead of Binomial: • If no more than 25% of the • And none ≤ 1, then use c2. • Others: • (1)Continuity Correction Yates correction • (3) Log-Likelihood Ratio Entropy Information

  35. (4)Heterogeneity Chi-Square There is often the need to combine chi-square analyses: the common cause is a batch effect where only a certain number of subjects (e.g., cages, school classes, laboratories, or clinics). There is a common hypothesis over all batches (e.g., gender, ethnicity, presence/absence of marker). (a) Perform chi-square on each “batch”. (b) Pool all batches and do a “pooled chi-square”. (c) Sum the individual chi-squares (d.f.= sum of the individual batch chi-squares= k batches times the df for each batch) (d) Subtract the pooled chi-square from the sum and test with (k-1)* individual batch df. This is the heterogeneity chi-square.

  36. Ex 22.5: heterogeneity chi-square analysis. G. Mendal 1933, 32 Difference total-pooled:

  37. Ex 22.6: Heterogeneity Chi-Square Difference total-pooled:

  38. Problem Set 1: • II. SC Exchanges in Lymphocytes • Table 4.Distribution of exchanges between chromosomes • Chromosome Total Relative Proportional Observed • Length Length Exchanges TEST: Ho: Exchanges are proportional to length of chromosome

More Related