1 / 20

Using R to Compute Probabilities of Matching Birthdays

Using R to Compute Probabilities of Matching Birthdays. Bruce E. Trumbo* Eric A. Suess Clayton W. Schupp †. California State University, Hayward JSM, August 2004 † Presenting at JSM * btrumbo@csuhayward.edu. Statement of Birthday Problem. 25 randomly chosen people in a room

seven
Download Presentation

Using R to Compute Probabilities of Matching Birthdays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using R to Compute Probabilities of Matching Birthdays

  2. Bruce E. Trumbo*Eric A. SuessClayton W. Schupp† California State University, Hayward JSM, August 2004 † Presenting at JSM * btrumbo@csuhayward.edu

  3. Statement of Birthday Problem • 25 randomly chosen people in a room • What is the probability two or more of them have the same birthday? • Model simplifications for a simple combinatorial solution: • Ignore leap years • Assume all 365 days equally likely • What error from assumptions?

  4. Let X = number of matches Seek P{X≥ 1} = 1 – P{X = 0} Seek P{X≥ 1} = 1 – 0.4313 = 0.5687 In R: prod(1 - (0:24)/365) Returns 0.4313003 Combinatorial Solution

  5. For n People in the Room • As n increases, clearly P{X≥ 1} increases. • If n = 366 (still ignoring leap years), we are sure to get at least one duplication. • But we will see P{X≥ 1} becomes very close to 1 for much smaller values of n. • Easy to show using a loop in R.

  6. R Script for Graph of P{Match} p <- numeric(50) for (n in 1:50){ q <- 1 - (0:(n - 1))/365 p[n] <- 1 - prod(q) } plot(p) # Makes Figure 1 p # Makes prinout below > p [1] 0.000000000 0.002739726 ... [21] 0.443688335 0.475695308 [23] 0.507297234 0.538344258 [25] 0.568699704 0.598240820 ... [49] 0.965779609 0.970373580

  7. Simulating the Birthday Process • Use R to “survey” from m = 10000 rooms, each with n = 25 randomly chosen people. • Two convenient functions in R: b <- sample(1:365, n, repl=T) samples 25 objects from among {1, 2, …, 365} with replacement, length(unique(b)) counts the number of unique objects.

  8. R Code Simulating the Birthday Process n <- 25; m <- 10000; x <- numeric(m) for (i in 1:m) { b <- sample(1:365, n, repl=T) x[i] <- n - length(unique(b)) } mean(x); mean(x==0) cut <- (0:(max(x) + 1)) - 0.5 hist(x, breaks=cut, freq=F, col=8) > mean(x) [1] 0.8081 > mean(x==0) [1] 0.4338

  9. Comments on Simulation Results > mean(x) [1] 0.8081 The proportion of rooms with no matches is very close to P{X = 0} = 0.4313. > mean(x==0) [1] 0.4338 The average number of matches per room simulates E(X), which is not easily obtained by combinatorial methods.

  10. How Serious Are Simplifying Assumptions? • We hope the 25 people in our room are randomly chosen. (For example, not a convention of twins, or Sagittarians.) • We know there are February 29th birthdays and that other birthdays are not uniformly distributed. We hope this does not matter much.

  11. Simulation Easy to Generalize Define a vector of weights: w <- c(rep(4,200), rep(5,165), 1) More extreme variation than actual. Sample according to these weights: sample(1:366, n, repl=T, prob=w) Slight modification of the R code simulates the nonuniform problem.

  12. R Code Simulating the Birthday Process With Weighted Probabilities n <- 25; m <- 20000; x <- numeric(m) w <- c(rep(4,200), rep(5,165), 1) for (i in 1:m) { b <- sample(1:366, n, repl=T, prob=w) x[i] <- n - length(unique(b)) } mean(x); mean(x==0) Uniform Simulation > mean(x) [1] 0.819 [1] 0.8081 > mean(x==0) [1] 0.42215 [1] 0.4338

  13. Breakdown in Birthday Problem • We have seen that mild departures from the uniform do not change substantially. • What about for extreme departures from the uniform? • Assume that half the birthdays are distributed over a period of k weeks (k = 1, . . . , 26). • Remainder are uniformly distributed over the rest of the year.

  14. Simulated Departures From Uniform(Red Line Indicates Uniform Distribution)

  15. Concluding Remarks • Simulation allows us to assess assumptions. Here the result was that realistic departures from the uniform are relatively harmless. • Simulating a case with a known answer builds confidence in the simulation model to be used for generalizing results. • Using R, simulations are easy enough to use in practical applications and in the classroom.

  16. References Peter Dalgaard: Introductory Statistics with R, 2002, Springer. William Feller: An Introduction to Probability Theory and Its Applications, Vol. 1, 1950 (3rd ed., 1968), Wiley. Thomas. S. Nunnikhoven: A birthday problem solution for nonuniform frequencies, The American Statistician, 46, 270–274, 1992. Jim Pitman and Michael Camarri: Limit Distributions and Random Trees Derived From the Birthday Problem With Unequal Probabilities,Electron. J. Probab. 5 Paper 2, 1-18, 2000

  17. Web Resources R software and manuals are available from www.r-project.org Birth frequencies are based on raw monthly totals are available at www.cdc.gov/nchs/products/pubd/vsus/vsus.html Auxiliary instructional materials for this article are posted at www.sci.csuhayward.edu/~btrumbo/bdmatch

More Related