Using r to compute probabilities of matching birthdays
Download
1 / 20

Using R to Compute Probabilities of Matching Birthdays - PowerPoint PPT Presentation


  • 156 Views
  • Uploaded on

Using R to Compute Probabilities of Matching Birthdays. Bruce E. Trumbo* Eric A. Suess Clayton W. Schupp †. California State University, Hayward JSM, August 2004 † Presenting at JSM * [email protected] Statement of Birthday Problem. 25 randomly chosen people in a room

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Using R to Compute Probabilities of Matching Birthdays' - seven


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Bruce e trumbo eric a suess clayton w schupp l.jpg

Bruce E. Trumbo*Eric A. SuessClayton W. Schupp†

California State University, Hayward

JSM, August 2004

† Presenting at JSM

* [email protected]


Statement of birthday problem l.jpg
Statement of Birthday Problem

  • 25 randomly chosen people in a room

  • What is the probability two or more of them have the same birthday?

  • Model simplifications for a simple combinatorial solution:

    • Ignore leap years

    • Assume all 365 days equally likely

  • What error from assumptions?


Combinatorial solution l.jpg

Let X = number of matches

Seek P{X≥ 1} = 1 – P{X = 0}

Seek P{X≥ 1} = 1 – 0.4313 = 0.5687

In R: prod(1 - (0:24)/365)

Returns 0.4313003

Combinatorial Solution


For n people in the room l.jpg
For n People in the Room

  • As n increases, clearly P{X≥ 1} increases.

  • If n = 366 (still ignoring leap years), we are sure to get at least one duplication.

  • But we will see P{X≥ 1} becomes very close to 1 for much smaller values of n.

  • Easy to show using a loop in R.


R script for graph of p match l.jpg
R Script for Graph of P{Match}

p <- numeric(50)

for (n in 1:50){

q <- 1 - (0:(n - 1))/365

p[n] <- 1 - prod(q) }

plot(p) # Makes Figure 1

p # Makes prinout below

> p

[1] 0.000000000 0.002739726

...

[21] 0.443688335 0.475695308

[23] 0.507297234 0.538344258

[25] 0.568699704 0.598240820

...

[49] 0.965779609 0.970373580


Simulating the birthday process l.jpg
Simulating the Birthday Process

  • Use R to “survey” from m = 10000 rooms, each with n = 25 randomly chosen people.

  • Two convenient functions in R: b <- sample(1:365, n, repl=T) samples 25 objects from among {1, 2, …, 365} with replacement, length(unique(b)) counts the number of unique objects.


R code simulating the birthday process l.jpg
R Code Simulating the Birthday Process

n <- 25; m <- 10000; x <- numeric(m)

for (i in 1:m) {

b <- sample(1:365, n, repl=T)

x[i] <- n - length(unique(b)) }

mean(x); mean(x==0)

cut <- (0:(max(x) + 1)) - 0.5

hist(x, breaks=cut, freq=F, col=8)

> mean(x)

[1] 0.8081

> mean(x==0)

[1] 0.4338


Comments on simulation results l.jpg
Comments on Simulation Results

> mean(x)

[1] 0.8081

The proportion of rooms with no matches is very

close to P{X = 0} = 0.4313.

> mean(x==0)

[1] 0.4338

The average number of matches per room simulates

E(X), which is not easily obtained by combinatorial

methods.


How serious are simplifying assumptions l.jpg
How Serious Are Simplifying Assumptions?

  • We hope the 25 people in our room are randomly chosen. (For example, not a convention of twins, or Sagittarians.)

  • We know there are February 29th birthdays and that other birthdays are not uniformly distributed. We hope this does not matter much.


Simulation easy to generalize l.jpg
Simulation Easy to Generalize

Define a vector of weights:

w <- c(rep(4,200), rep(5,165), 1)

More extreme variation than actual.

Sample according to these weights:

sample(1:366, n, repl=T, prob=w)

Slight modification of the R code simulates

the nonuniform problem.


R code simulating the birthday process with weighted probabilities l.jpg
R Code Simulating the Birthday Process With Weighted Probabilities

n <- 25; m <- 20000; x <- numeric(m)

w <- c(rep(4,200), rep(5,165), 1)

for (i in 1:m) {

b <- sample(1:366, n, repl=T, prob=w)

x[i] <- n - length(unique(b)) }

mean(x); mean(x==0)

Uniform Simulation

> mean(x)

[1] 0.819 [1] 0.8081

> mean(x==0)

[1] 0.42215 [1] 0.4338


Breakdown in birthday problem l.jpg
Breakdown in Birthday Problem Probabilities

  • We have seen that mild departures from the uniform do not change substantially.

  • What about for extreme departures from the uniform?

    • Assume that half the birthdays are distributed over a period of k weeks (k = 1, . . . , 26).

    • Remainder are uniformly distributed over the rest of the year.


Simulated departures from uniform red line indicates uniform distribution l.jpg
Simulated Departures From Uniform Probabilities(Red Line Indicates Uniform Distribution)


Concluding remarks l.jpg
Concluding Remarks Probabilities

  • Simulation allows us to assess assumptions. Here the result was that realistic departures from the uniform are relatively harmless.

  • Simulating a case with a known answer builds confidence in the simulation model to be used for generalizing results.

  • Using R, simulations are easy enough to use in practical applications and in the classroom.


References l.jpg
References Probabilities

Peter Dalgaard: Introductory Statistics with R, 2002, Springer.

William Feller: An Introduction to Probability

Theory and Its Applications, Vol. 1, 1950

(3rd ed., 1968), Wiley.

Thomas. S. Nunnikhoven: A birthday problem

solution for nonuniform frequencies,

The American Statistician, 46, 270–274, 1992.

Jim Pitman and Michael Camarri: Limit Distributions and Random Trees Derived From the Birthday Problem With Unequal Probabilities,Electron. J. Probab. 5 Paper 2, 1-18, 2000


Web resources l.jpg
Web Resources Probabilities

R software and manuals are available from

www.r-project.org

Birth frequencies are based on raw monthly totals are

available at

www.cdc.gov/nchs/products/pubd/vsus/vsus.html

Auxiliary instructional materials for this article are

posted at

www.sci.csuhayward.edu/~btrumbo/bdmatch


ad