Strategies for Winning: More Monty Hall Problem Analysis

DATA 220Mathematical Methods for Data AnalysisOctober 1 Class Meeting Department of Applied Data ScienceSan Jose State UniversityFall 2019Instructor: Ron Mak www.cs.sjsu.edu/~mak

More Monty Hall • The classic Monty Hall problem has 3 doors and 1 car. • You can generalize the problem to n doors and k cars, where n ≥ 3 and k ≤ n – 2. • You win either by staying or by switching. • You can also get a goat by staying or by switching. • If you stay with your original door, the probability that you win a car is

More Monty Hall, cont’d • If you switch doors: • of the time, your first door choice is one of the right ones with a car. Monty opens a door to reveal a goat, and the remaining n – 2 doors will hide k – 1 cars. Therefore, the probability that your second door choice will win a car is • of the time, your first door choice is wrong. The remaining n – 2 doors will hide k cars. Therefore, the probability that your second choice will win is n doors and k cars, where n ≥ 3 and k ≤ n – 2

More Monty Hall, cont’d • Add up the probabilities that you will win a car by switching: • The probability of winning by staying is • Since n – 1 > n – 2, the fraction • Therefore, the probability of winning a car is always greater if you switch, although the advantage diminishes as n increases. n doors and k cars, where n ≥ 3 and k ≤ n – 2

More Monty Hall, cont’d • Do a simulation! Stay Stay Switch Switch switch (n-1) Simulations Doors Cars wins win % wins win % /stay /(n-2) 10,000 3 1 3,296 32.960% 6,704 67.040% 2.034 2.000 10,000 4 1 2,503 25.030% 3,776 37.760% 1.509 1.500 10,000 5 1 2,040 20.400% 2,667 26.670% 1.307 1.333 10,000 5 2 4,044 40.440% 5,248 52.480% 1.298 1.333 10,000 5 3 6,016 60.160% 8,036 80.360% 1.336 1.333 100,000 50 1 1,952 1.952% 1,955 1.955% 1.002 1.021 100,000 50 10 19,920 19.920% 20,298 20.298% 1.019 1.021 100,000 50 25 49,960 49.960% 51,185 51.185% 1.025 1.021 1,000,000 100 1 9,976 0.998% 10,225 1.022% 1.025 1.010 1,000,000 100 10 100,261 10.026% 101,115 10.111% 1.009 1.010 1,000,000 100 25 250,076 25.008% 252,402 25.240% 1.009 1.010 1,000,000 100 50 500,018 50.002% 504,672 50.467% 1.009 1.010 1,000,000 100 75 749,394 74.939% 756,846 75.685% 1.010 1.010

Conditional Probability Formulas • Recall that • Exchange the roles of A and B: • Therefore:

Example: Bad Loan Risks • A bank’s loan officer knows that: • 5% of all loan applicants are bad risks. • 92% of all loan applicants who are bad risks are also rated bad risks by a credit advisory service. • 2% of all loan applicants are actually good risks but were rated bad risks by the credit advisory service. • What is the probability that a loan applicant who was rated a bad risk by the credit service is truly a bad loan risk?

Example: Bad Loan Risks, cont’d • 5% of all loan applicants are bad risks. • 92% of all loan applicants who are bad risks are also rated bad risks by a credit advisory service. • 2% of all loan applicants are actually good risks but are rated bad risks by the credit advisory service. • Let A be the event that the credit advisory service rates a loan applicant a bad risk. • Let B be the event that the loan applicant is truly a bad risk. • What is P(B|A)? Who is truly a bad loan risk given a bad credit rating? B A 0.92 0.05 Bad credit Bad risks 0.95 Good risks Bad credit 0.02 0.95 = 1 – 0.05 A

Example: Bad Loan Risks, cont’d • 5% of all loan applicants are bad risks. • 92% of all loan applicants who are bad risks are also rated bad risks by a credit advisory service. • 2% of all loan applicants are actually good risks but are rated bad risks by the credit advisory service. • What is P(B|A)? • First calculate Bad credit Bad risks Good risks Bad credit B A 0.92 0.05 0.95 0.02 A The probability that a loan applicant who is rated a bad risk by the credit advisory service is truly a bad loan risk is 0.71

Example: Disgruntled Engineers • A company does a survey to understand why its engineers are quitting. It discovers that: • 20% dislike their work  probability of quitting = 0.60 • 50% feel underpaid  probability of quitting = 0.40 • 30% dislike their boss  probability of quitting = 0.90 B1 A What is the most likely reason that an engineer quit? 0.60 Engineer quits Dislikes work 0.20 A B2 0.50 0.40 Feels underpaid Engineer quits 0.30 A B3 0.90 Dislikes boss Engineer quits

Example: Disgruntled Engineers, cont’d Sum 0.59 B1 A 0.60 Engineer quits Dislikes work 0.20 A B2 0.50 0.40 Feels underpaid Engineer quits 0.30 A B3 0.90 Most likely the engineer disliked the boss. Dislikes boss Engineer quits

Bayes’ Theorem If B1, ... Bn are mutually exclusive eventsand A is another event, then for a particular value of k = 1, 2, ..., or n

Bayes’ Theorem, cont’d • A is an observable event. • An engineer quits. • The probabilities P(Bi) are prior probabilities wherei = 1, 2, ..., n • P(dislikes work), etc., before observing the quit event A. • The problem is to find eachposterior probability P(Bk| A)for k = 1, 2, ..., or n. • P(dislike work | quit)after observing the quit event A.

Thomas Bayes Thomas Bayes (1701-1761) was an English statistician, philosopher, and minister. He formulated a specific caseof the theorem that bears his name. However, he did notpublish his most famous accomplishment. His friendRichard Price edited and published Bayes’s notes afterBayes’s death. Bayesian statistics is based on the Bayesian interpretation of probability, which incorporates one’s degree of belief in an event. The degree of belief may be based on prior knowledge of the event, previous experimental results, or personal beliefs.

Thomas Bayes, cont’d Bayesian statisticians’ interpretation of probability isdifferent from the frequentists’ interpretation. Recall that the latter views probability as the limit of the relative frequency of an event after many experimental trials. Many frequentist statisticians viewed Bayesian statistics unfavorably due to philosophical and practical considerations. But with modern computational power, the use of Bayesian statistics is increasing. See: https://en.wikipedia.org/wiki/Thomas_Bayesand https://en.wikipedia.org/wiki/Bayesian_statistics

Example: Disease Test • A disease affects 1 out of every 1,000 people. • A test gives the correct result 90% of the time. • 90% of the time, it gives a positive result if you have the disease or a negative result if you don’t. • 10% of the time, it gives an incorrect result. • Your test is positive. Do you have the disease? Are data scientists smarter than doctors?

Example: Disease Test, cont’d • The disease affects 1 out of 1,000 people. • The test is correct 90% of the time and wrong 10% of the time. • Your test is positive. Do you have the disease? • Let A be the event that the test is positive. • Let B be the event that you have the disease. Less than 1%of the people who test positive actually have the disease.

Break

Statistics • Statistics: The branch of science that deals with the collection and analysis of data. • Broad categories of statistics • Descriptive: Basic description of the data (measures of central tendency and measures of variability). • Analytical: Draw conclusions (“inferences”) about all the data based on samples drawn from the data. • Predictive: Use data mining, etc., on given data to make predictions about the attributes of new data. • Prescriptive: Improve the quality of decision-making.

Sampling a “Big Data” Population • In statistics, the population is the entire dataset. • The population’s mean, median, standard deviation, etc., are its population parameters. • Problem: The population’s size is too large(or infinite) to calculate its parameter values. • So you take samples from the population and compute each one’s sample statistics. • Use the mean, median, etc., sample statistics to estimate the corresponding population parameters.

Sampling a “Big Data” Population, cont’d • National pollsters don’t ask everybody his or her opinion – instead, they ask samples taken from the population and use the results to infer what the entire population thinks. • Challenges • Were the members of the sample randomly selected and are they representative of the population? • What should the sample size be? • How many samples need to be taken? • How accurate are the estimates (predictions)?

Estimates of the Population Mean μ • You want to predict the population mean, so you use the mean of a sample as an estimate. • The Greek letter µ (“mew”) is the population mean. • A sample of x values taken from the population has the mean . • You take multiple samples, each with its mean . µ

The Central Limit Theorem • Take multiple samples from a population and calculate a certain statistic (mean, median, etc.) of each sample. • The values of this statistic calculated individually from the samples will be normally distributed. • This is true even if the population itself does not have a normal distribution. • The sample mean is itself a random variable. • The mean of the distribution will be an estimate of the corresponding population parameter.

The Central Limit Theorem, cont’d • Example: A frequency chart of a populationof integer values 0 – 39, size 2,000,000: • We want to know the mean µ of the population. • Suppose we can’t access all the values of the population, or it’s too expensive to calculate using all the values.

The Central Limit Theorem, cont’d • We take multiple samples from the population and calculate the mean of each sample. • According to the Central Limit Theorem, the values calculated individually from all the samples are normally distributed. • The mean of this distribution of sample means is an estimate of µ.

The Central Limit Theorem, cont’d • The population (µ = 29.51 which we want to know but suppose we can’t calculate): • The normal distribution of the sample means: • This sampling distribution of the mean itself has the mean= 29.05, an estimate of µ.

The Number of Samples • How many samples should you take from the population? • Each sample should be the same size. • The members of each sample should be randomly chosen, and each member of the population should have an equal probability of being chosen. • The members of each sample to be representative of the members of the population as a whole. • The more samples you take, the smoother the normal curve of the distribution of the mean.

The Number of Samples, cont’d Population: : 2,000,000 29.51 Sample count: Sample size: : 100 30 29.36 Sample count: Sample size: : 1,000 30 29.01 Sample count: Sample size: : 10,000 30 28.98

The Size of Each Sample • Increasing the sample size decreases the standard deviation of the distribution of the mean. • The normal curve becomes narrower. • For the best estimates, the sample size should be greater than 30.

The Size of Each Sample, cont’d Population: : 2,000,000 29.51 Sample count: Sample size: : 10,000 10 29.15 Sample count: Sample size: : 10,000 30 29.01 Sample count: Sample size: : 10,000 100 28.98

The Standard Error • The sampling distribution of the mean has its own standard deviation which is related to the population’s standard deviation :where n is the size of the sample. • The quantity is the standard error of . • As the sample size n increases, the standard error decreases, thereby increasing the accuracy of the estimate of µ.

Other Sampling Distributions • Besides the mean, we can use the Central Limit Theorem to estimate other population parameters.

Sampling Distribution of the Mean • Population: • : • Median: • : 2,000,000 29.51 27 21.89 Sample count: Sample size: : : 10,000 10 29.11 6.92 Sample count: Sample size: : : 10,000 30 29.05 4.00 Sample count: Sample size: : : 10,000 100 29.01 2.19

Sampling Distribution of the Median • Population: • : • Median: • : 2,000,000 29.51 27 21.89 Sample count: Sample size: Est. median: : 10,000 10 26.94 6.92 Sample count: Sample size: Est. median: : 10,000 30 27.12 4.00 Sample count: Sample size: Est. median: : 10,000 100 27.29 2.19

Sampling Distribution of the Standard Deviation • Population: • : • Median: • : 2,000,000 29.51 27 21.89 Sample count: Sample size: Est. std. dev.: : : 10,000 10 18.94 6.92 21.88 Sample count: Sample size: Est. std. dev.: : : 10,000 30 20.41 4.00 21.91 Sample count: Sample size: Est. std. dev.: : 10,000 100 21.00 2.19 21.9

Lab Assignment #6: Central Limit Theorem • To demonstrate that the Central Limit Theorem works with a population of various probability distributions, pick three different distributionsto work with for this assignment. • uniform, normal, exponential, binomial, Poisson, etc. • Generate random values from each distribution. • For each distribution, use sampling to estimate any three population parameters. • mean, median, standard deviation, Q1, Q3, interquartile range, etc.

Lab Assignment #6, cont’d • For each estimate, experiment with different numbers of samples and different sample sizes. • Create Seaborn charts to illustrate your results. • Write a short report (5 - 7 pages): • Describe which distributions and parameters you used for this assignment. • What were the results? What inferences were you able to make? Did you notice anything unusual that needed further exploration or research to explain? • You can incorporate your report in your Jupyter notebook as markdowns and comments. Jupyter notebooks due Monday, October 7

Strategies for Winning: More Monty Hall Problem Analysis