Probability Models for Distributions of Discrete Variables - PowerPoint PPT Presentation

probability models for distributions of discrete variables n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Probability Models for Distributions of Discrete Variables PowerPoint Presentation
Download Presentation
Probability Models for Distributions of Discrete Variables

play fullscreen
1 / 85
Probability Models for Distributions of Discrete Variables
106 Views
Download Presentation
aysel
Download Presentation

Probability Models for Distributions of Discrete Variables

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Probability Models for Distributions of Discrete Variables

  2. Randomly select a college student. Determine x, the number of credit cards the student has. x = # of cards p(x) = probability of x occurring

  3. A population is a collection of all units of interest. Example: All college students A sample is a collection of units drawn from the population. Example: Any subcollection of college students. Probabilities go with populations. Scientific studies randomly sample from the entire population. Each unit in the sample is chosen randomly. The entire sample is random as well. Populations / Samples

  4. For discrete data, a population and a sample are summarized the same way (for instance, as a table of values and accompanying relative frequencies). A probability distribution (or model) for a discrete variable is a description of values, with each value accompanied by a probability. Probability Models and Populations

  5. Definitions of Probability 2. the probability of an event is the long term (technically forever) relative frequency of occurrence of the event, when the experiment is performed repeatedly under identical starting conditions. 3. The probability of an event is the relative frequency of units in the population for which the event applies. To aggregate these meanings: The probability associated with an event is its relative frequency of occurrence over all possible ways the phenomena can take place. Probability Models and Populations

  6. “All models are wrong. Some are useful.” George Box -industrial statistician Probability Models

  7. A probability distribution for a discrete variable is tabulated with a set of values, x and probabilities, p(x). Probabilities Must be nonnegative.

  8. A probability distribution for a discrete variable is tabulated with a set of values, x and probabilities, p(x). • Probabilities • Must be nonnegative. • Must sum to 1. • Within rounding error.

  9. The mean  of a probability distribution is the mean value observed for all possible outcomes of the phenomena.

  10. Consider idealized data sets

  11. Idealized data set n = 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 Mean = 1.80 SD = 1.44

  12. Consider idealized data sets

  13. Idealized data set n = 1000 0 0 0 0 0 0 0 … 0 (200) 1 1 1 1 1 1 1 1 1 1 … 1 (300) 2 2 2 2 2 2 … 2 (200) 3 3 3 3 … 3 (150) 4 4 … 4 (100) 5 … 5 (50) Mean = 1.80 SD = 1.44

  14. Values for the mean and standard deviation don’t depend on the number of data values; they depend instead on the relative location of the data values – they depend on the distribution in relative frequency terms.

  15. The mean  of a probability distribution is the mean value observed for all possible outcomes of the phenomena. Formula:  is synonymous with “population mean” SUM symbol Greek letter “myou”

  16. Multiply each value by its probability Sum the products Mean = 1.80

  17. The standard deviation  of a probability distribution is the standard deviation of the values observed for all possible outcomes of the phenomena. Formula:  denotes “population standard deviation” Greek letter “sigma”

  18. First obtain the variance.

  19. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 Mean = 1.80 SD = 1.44 Mean – SD = 0.56 Mean + SD = 3.24 65 / 100 = 65%

  20. Mean = 1.80 SD = 1.44 Mean – SD = 0.56 Mean + SD = 3.24 0.30 + 0.20 + 0.15 = 0.65

  21. x = # children in randomly selected college student’s family.

  22. x = # children in randomly selected college student’s family. 0.2194 = 21.94% of all college students come from a 1 child family.

  23. Guess at mean? Above 2 (right skew  mean > mode).

  24. To determine the mean, multiply values by probabilities, xp(x) and sum these. 55/10 = 5.50 is not the mean 1.000/10 = 0.10 is not the mean

  25. To determine the variance, multiply squared deviations from the mean by probabilities, (x – )2p(x) and sum these.

  26. The standard deviation is the square root of the variance. Examining the data set consisting of # of children in the family recorded for all students: The mean is 2.743; the standard deviation is 1.468.

  27. Determine the probability a student is from a family with more than 5 siblings. P(x > 5)

  28. Determine the probability a student is from a family with more than 5 siblings. P(x > 5)

  29. Determine the probability a student is from a family with more than 5 siblings. P(x > 5)

  30. Determine the probability a student is from a family with more than 5 siblings. P(x > 5)

  31. Determine the probability a student is from a family with more than 5 siblings. P(x > 5)

  32. Determine the probability a student is from a family with more than 5 siblings. P(x > 5) = 0.0317 + 0.0124 + 0.0043 + 0.0005 + 0.0003

  33. Determine the probability a student is from a family with more than 5 siblings. P(x > 5) = 0.0317 + 0.0124 + 0.0043 + 0.0005 + 0.0003 = 0.0492

  34. Determine the probability a student is from a family with more than 5 siblings. P(x > 5) = 0.0492 4.92% of all college students come from families with more than 5 children (they have 4 or more brothers and sisters).

  35. Determine the probability a student is from a family with at most 3 siblings. P(x 3) = 0.2194 + 0.2806 + 0.2329 = 0.7329

  36. Determine the probability a student is from a family with at least 7 siblings. P(x 7) = 0.0124 + 0.0043 + 0.0005 + 0.0003 = 0.0175 Good idea: Take the reciprocal of a small probability… 1/.0175 = 57.1  1 in 57 students

  37. Determine the probability a student is from a family with fewer than 5 siblings. P(x< 5) = 0.2194 + 0.2806 + 0.2329 + 0.1442 = 0.8771

  38. at most 3 at least 7   less than or equal to 3 greater than or equal to 7   no more than 3 no fewer/less than 7   x 3x  7

  39. Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. Guess? 0.68

  40. Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. Mean= 2.743 SD  = 1.468 1 SD below the mean 2.743 – 1.468 = 1.275 1 SD above the mean 2.743 + 1.468 = 4.211

  41. Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. 1 SD below the mean = 1.275 1 SD above the mean = 4.211 Values are within 1 SD of the mean if they are between these.

  42. Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. 1 SD below the mean = 1.275 1 SD above the mean = 4.211 Values are within 1 SD of the mean if they are between these.

  43. Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. 1 SD below the mean = 1.275 1 SD above the mean = 4.211 Values are within 1 SD of the mean if they are between these. The probability of being between these: 0.2806 + 0.2329 + 0.1442 = 0.6577

  44. Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Guess? 0.95 2 SD below the mean 1.275 – 1.468 = -0.193 2 SD above the mean 4.211+ 1.468 = 5.679 Between -0.193 and 5.679.

  45. Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Between -0.193 and 5.679. (Equivalent to 5 or fewer.)

  46. Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Between -0.193 and 5.679. (Equivalent to 5 or fewer.) We know an outcome more than 5 has probability 0.0492.

  47. Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Between -0.193 and 5.679. (Equivalent to 5 or fewer.) We know an outcome more than 5 has probability 0.0492. The probability of an outcome at most 5 is 1 – 0.0492 = 0.9508.

  48. Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Between -0.193 and 5.679. 0.9508.

  49. A company monitors pollutants downstream of discharge into a stream. Data were collected on 200 days from a point 1 mile downstream of the plant on Stream A. Data were collected on 100 days from a point 1 miles downstream of the plant on Stream B. Pollutant Particles in Streamwater

  50. How do means compare? (What are the means?) How do SDs compare? (What are the SDs?)