# Statistics and Modelling Course - PowerPoint PPT Presentation

Statistics and Modelling Course

1 / 254
Statistics and Modelling Course

## Statistics and Modelling Course

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Topic: Confidence Intervals Achievement Standard 90642 Calculate Confidence Intervals for Population Parameters 3 Credits Externally Assessed NuLake Pages 63101

2. LESSON 1 – Sampling Handout with gaps to fill in – goes with the following slides. STARTER: Look at the following 2 examples of bad sampling technique & discuss what’s wrong in each case. 1. Discuss how you’d obtain a representative sample from our school roll. 2. Notes on sampling and inference. 3. Population and Samples ‘Policemen’ worksheet (from Achieving in Statistics). Complete for HW.

3. Sampling Describe some faults with each of these sampling methods.

4. Sampling Describe some faults with each of these sampling methods. (a) A survey on magazine readership is conducted by phoning households between 1 and 4pm. • People who aren’t at home during those times cannot be surveyed. • Some people don’t have a phone

5. Sampling Describe some faults with each of these sampling methods. (b) A talkback radio station asks listeners to phone in with a quick ‘yes’ or ‘no’ answer to the question “Should NZ have capital punishment?” • Only people who are listening at the time can participate. • Self-selected sample. Only those with a strong opinion will ring in.

6. Sampling You are asked the question: “How tall are St. Thomas students?” • You only have time to measure the heights of 35 students. Q1: How would you choose which 35 students to measure. Q2: Once you’ve measured your 35 students’ heights, how would you use this data to answer the question: “How tall are St. Thomas students?

7. Purpose of a Sample POPULATION Make an inference SAMPLE

8. Purpose of a Sample POPULATION Make an inference Inferences SAMPLE Sampling terminology

9. Purpose of a Sample POPULATION Make an inference Inferences SAMPLE Sampling terminology POPULATION: Target Population:All items under investigation. We usually just call it the “Population”. SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population)

10. Sampling terminology POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

11. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

12. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

13. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

14. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

15. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

16. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

17. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

18. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

19. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

20. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

21. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

22. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

23. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

24. POPULATION: Target Population: All items under investigation. We usually just call it the “Population.” SAMPLES: Sample: Subset selected to REPRESENT the population. Sampling Frame:A list/database of items from which we select our sample. (Should include all items in the Target Population) For a sample to be Representative of a given population: The Sampling Frame must match the Target Population.

25. A representative sample should have… • Sample-size large enough to allow the results to be meaningful • (rough guide: sample size of n> 30). • No Bias – Sample selection is said to be “biased” if some items are more likely to be chosen than others. Every item in the target population should be equally-likely to be chosen. Random selection ensures this. • MinimalNon-response– difficult to control this.

26. Example: A home security firm is hoping to sell as many burglar alarms as possible to householders in a certain town. Usually each house only needs one burglar alarm. Before the firm orders the alarms from their supplier, they wish to have an indication of how many alarms they might sell. 1.) What is the target population? A. all the people who live in the town. B. the head of each household. C. the houses in the town.

27. Example: A home security firm is hoping to sell as many burglar alarms as possible to householders in a certain town. Usually each house only needs one burglar alarm. Before the firm orders the alarms from their supplier, they wish to have an indication of how many alarms they might sell. 1.) What is the target population? A. all the people who live in the town. B. the head of each household. C. the houses in the town.

28. Example: A home security firm is hoping to sell as many burglar alarms as possible to householders in a certain town. Usually each house only needs one burglar alarm. Before the firm orders the alarms from their supplier, they wish to have an indication of how many alarms they might sell. 1.) What is the target population? Answer: C. the houses in the town. 2.) What is the sampling frame? A. the electoral roll for the town. B. a list of all the people who live in the town. C. a list of all the houses in the town.

29. Example: A home security firm is hoping to sell as many burglar alarms as possible to householders in a certain town. Usually each house only needs one burglar alarm. Before the firm orders the alarms from their supplier, they wish to have an indication of how many alarms they might sell. 1.) What is the target population? Answer: C. the houses in the town. 2.) What is the sampling frame? A. the electoral roll for the town. B. a list of all the people who live in the town. C. a list of all the houses in the town.

30. Do Population and Samples ‘Policemen’ worksheet. Finish by Monday. Will mark as a class. Example: A home security firm is hoping to sell as many burglar alarms as possible to householders in a certain town. Usually each house only needs one burglar alarm. Before the firm orders the alarms from their supplier, they wish to have an indication of how many alarms they might sell. 1.) What is the target population? Answer: C. the houses in the town. 2.) What is the sampling frame? Answer: C. a list of all the houses in the town. 3.) How would you select a representative sample of the houses in the town? (discus s as a class)

31. EXTRA ON SAMPLING TECHNIQUES IF TIME (schol students) Otherwise skip to Lesson 3: Distribution of Sample Means 1

32. Extension Lesson:Other sampling techniques Good sampling techniques: • Simple Random Sampling • Systematic Sampling • Stratified Sampling • Cluster Sampling Bad sampling techniques (biased selection): • Convenience sampling. • Self-selected sampling.

33. Random selection Q: What does the word “random” actually mean? Q: How would you select a student at random from this school?

34. 21.03 Simple random sampling. Generate 20 different random numbers between 1 and 100. If a random number has already occurred, generate more as needed. Calculator formula1 + 100×RAN# 42 67 2 12 77 49 60 20 45 15 64 7 8 21 15 64 58 14 29 68 26 90

35. 1. Simple Random Sampling • Obtain a list of all N items in the target population, numbering them 1 to N (e.g. the school roll: 1-600). • Decide how many you will select for your sample (n). • Use the random number generator on your calculator to select numbers at random between 1 and N: On calculator, type: 1 + Population size × RAN# 4. Keep pressing ‘equals’ until you have selected ndifferent items. Discard any repeats. Advantage of SR sampling: Ensures that every item in the population has an equal chance of being selected – so no chance of bias.

36. Select a sample of 35 students from the St. Thomas school roll. • HW: Old Sigma Pg. 130 – Ex. 9.1 (all), then Pg. 134 – Ex. 9.2 – just Q1. 2. Decide how many you will select for your sample (n). 3. Use the random number generator on your calculator to select numbers at random between 1 and N: On calculator, type: 1 + Population size × RAN# 4. Keep pressing ‘equals’ until you have selected ndifferent items. Discard any repeats. Advantage of SR sampling: Ensures that every item in the population has an equal chance of being selected – so no chance of bias. Disadvantage: • Does not ensure that all subgroups of the population are represented in proportion (e.g. some racial, socio-economic groups could be over/under-represented).

37. 3 other good sampling techniques Systematic sampling • Obtain a list of all N items in the target popn (numbered 1N). • Pick a random starting point (e.g. item number 7) • Sample every kth itemafter that, where k=N/n until you have selected n items. Cluster sampling Use when the population is distributed into naturally-occurring groups or ‘clusters’ (e.g. towns and cities in NZ) Stage 1: Select the clusters: Select a representative sample of the clusters themselves. Stage 2: Select a random sample of items within chosen clusters. Must be in proportion to the percentage of the population found in each. (Called Proportional Allocation ) Stratified sampling Use when the population consists of categories (strata), (e.g. racial groups) Divide sampling frame into the strata (categories). Select aseparate random sample from each stratum in proportion to the percentage of the population found in each. (Called Proportional Allocation )

38. Comparison of samples. 21.03 Simple random sampling Systematic sampling Cluster sampling Stratified sampling

39. 3 other good sampling techniques Systematic sampling • Obtain a list of all N items in the target popn (numbered 1N). • Pick a random starting point (e.g. item number 7) • Sample every kth itemafter that, where k=N/n until you have selected n items. • Select a sample of between 30 and 36 students from the school roll using each of these 3 methods. • Write down at least one advantage and at least one disadvantage/risk associated with each of these 3 techniques. Cluster sampling Use when the population is distributed into naturally-occurring groups or ‘clusters’ (e.g. towns and cities in NZ) Stage 1: Select the clusters: Select a representative sample of the clusters themselves. Stage 2: Select a random sample of items within chosen clusters. Must be in proportion to the percentage of the population found in each (Proportional Allocation). Stratified sampling Use when the population consists of categories (strata), (e.g. racial groups) Divide sampling frame into the strata (categories). Select aseparate random sample from each stratum in proportion to the percentage of the population found in each. (Called Proportional Allocation ) • HW: Do Old Sigma (2nd edition) p137: Ex. 9.3.

40. 21.03 Systematic sampling. To obtain a systematic sample of size 20 from this data. Choose a starting point at random between 1 and 100. Using calculator1 + 100×RAN# = Suppose this gives 5.87352 5. So start at item number 5. Then choose every kth item, where k = N/n. = 100/20 = 5. So sample every 5th item.

41. Systematic Sampling • Obtain a list of all N items in the target population. • Decide on your sample size, n . • Pick a random starting point (e.g. item number 7) • Sample every kth item after that, where k=N/n until you have selected n items. Advantages: • Ensures that sample is selected from throughout the breadth of the sampling frame. • Convenient and fast – easier to collect info on items that are in a sequence (every 5th house) than from a random sample where they are scattered all over.

42. 4. Sample every kth item after that, where k=N/n until you have selected n items. Advantages: • Ensures that sample is selected from throughout the breadth of the sampling frame. • Convenient and fast – easier to collect info on items that are in a sequence (every 5th house) than from a random sample where they are scattered all over. Disadvantage: Be careful that the list itself has no systematic pattern. If every 2nd house on a street were sampled, all would be on the same side of the street!

43. Stratified sampling. 21.03 Hass: 1–40 40% Fuerte: 41–70 30% Hopkins: 71–100 30% Suppose the avocados are of 3 different varieties. The number in each strata of the sample should be proportional to the number in each group in the population. Hass: 40% x 20 = 8 Fuerte: 30% x 20 = 6 Hopkins: 30% x 20 = 6

44. Stratified sampling. 21.03 Thus generate random numbers as follows: Hass: 1–40 8 random nos. 33 17 12 25 9 9 33 16 39 8 Fuerte: 41–70 6 random nos. 58 59 67 43 53 56 Hopkins: 71–100 6 random nos. 98 85 96 99 90 81

45. Stratified sampling Use when the population consists of categories (strata), and you wish to represent each ‘stratum’ proportionally (e.g. racial groups, one-story and multi-story homes within a city). • Obtain a list of all N items in the target population. • Decide on your sample size, n . • Divide list into the strata (categories). • Select aseparate random sample from each stratum in proportion to the percentage of the population found in each. Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population.

46. Obtain a list of all N items in the target population. • Decide on your sample size, n . • Divide list into the strata (categories). • Select aseparate random sample from each stratum in proportion to the percentage of the population found in each. Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population. E.g. If 12% of a city’s citizens are Pacific Islanders, then 12% of the sample size should be selected from among the Pacific Island citizens.

47. 3. Divide list into the strata (categories). 4. Select aseparate random sample from each stratum in proportion to the percentage of the population found in each. Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population. E.g. If 12% of a city’s citizens are Pacific Islanders, then 12% of the sample size should be selected from among the Pacific Island citizens. Advantage: Guaranteed to be representative of each stratum. Disadvantage: Time-consuming and expensive because you must collect information about the strata-sizes in advance.

48. Cluster sampling Use when the population is distributed into naturally-occurring groups or ‘clusters’ (e.g. towns and cities in a country). • Select a representative sample of the clusters themselves (usually a lot so we can’t sample from all). • Select a random sample of items from within each chosen cluster. • Again, use Proportional Allocation (like with stratified samples). Weight the number selected from each cluster according to the cluster size. E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each.

49. Select a representative sample of the clusters themselves (usually a lot so we can’t sample from all). • Select a random sample of items from within each chosen cluster. • Again, use Proportional Allocation (like with stratified samples). Weight the number selected from each cluster according to the cluster size. E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each. Advantage: • Cheaper and faster when sampling from a geographically large area (data can be collected in groups within chosen clusters rather than being spread out).