html5-img
1 / 30

HR: Samples, Sampling, and Sample size

HR: Samples, Sampling, and Sample size . A practical guide. Samples, Sampling, and Sample Size. Samples – Used in research (i.e. for estimation and hypothesis testing), concerns theories around sampling and why we sample (i.e. sampling distributions).

kaya
Download Presentation

HR: Samples, Sampling, and Sample size

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HR: Samples, Sampling, and Sample size A practical guide

  2. Samples, Sampling, and Sample Size • Samples – Used in research (i.e. for estimation and hypothesis testing), concerns theories around sampling and why we sample (i.e. sampling distributions). • Sampling- The process of taking samples, must guard against bias and threats to validity. • Sample size- The practical issue of how many subjects or units are needed for valid estimation or inference.

  3. Process of Sampling Involves: • Identification of study population (target) • Determination of sampling population (sampling frame) • Definition of the sampling unit (individual, family, etc.) • Choice of sampling method (what is possible, what is optimal) • Estimation of the sample size (depends on study question and study design)

  4. Basic Questions about Sampling • Why sample? • Efficiency and quality • Who to sample? • Usually a representation of the population of interest • How to sample? • Use the sampling method most appropriate. • Number to sample? • As many as required to so potential sampling error is limited.

  5. Why sample? To acquire information about larger populations • Less costs • Less field time • When it’s impossible to study the whole population • More accuracy -A Better Job of Data Collection (more time per sample unit- higher quality data)

  6. Who to sample? Identification of study population • Sampling is the process of selection of a number of units from a defined study population. • The study or target population is the one upon which the results of the study will be generalized. • It is crucial that the study population is clearly defined, since it is the most important determinant of the sampling population

  7. The Sampling Frame • The sampling frame is the one from which the sample is drawn. • The definition of the sampling frame by the investigator is governed by two factors: • Feasibility: reachable sampling population • External validity: the ability to generalize from the study results to the target population.

  8. The Sampling Unit • To define the sampling unit set: • Inclusion criteria • Exclusion criteria • May sample individuals, households, or larger units. • Consider unit of analysis: individual income, household income, city median income.

  9. How to sample? • Non-probability sampling • Probability sampling Choices in sampling method

  10. Non-probability sampling: • Types of non probability sampling: • Convenience sampling (selected from elements of a population that are easily accessible) • Quota sampling (set number by type) • Purposeful sampling (You chose who you think should be in the study) • Snowball sampling (friend of friend….etc.) • Not recommended in health research if generalization or statistical analysis is intended: • By far the most biased sampling procedure as it is not random (not everyone in the population has an equal chance of being selected to participate in the study). • Analytical/statistical procedures usually assume the sampled units came randomly from the assumed statistical distribution.

  11. Probability sampling “There is a known non-zero probability of selection for each sampling unit” • Types: • Simple random sampling • Systematic random sampling • Stratified random sampling • Cluster sampling • Others: • Multi-stage random sampling • Multi-phase sampling

  12. Simple random sample • In this method, all subject or elements have an equal probability of being selected. There are two major ways of conducting a random sample. • The first is to consult a random number table, and the second is to have the computer select a random sample. • Enumeration required/assumed.

  13. Systematic random sample • A systematic sample is conducted by randomly selecting a first case on a list of the population and then proceeding every Nth case until your sample is selected. This is particularly useful if your list of the population is long. • For example, if your list was the phone book, it would be easiest to start at perhaps the 17th person, and then select every 50th person from that point on. • Sampling fraction: Ratio between sample size and population size

  14. Stratified sample • In a stratified sample, we sample either proportionately or equally to represent various strata or subpopulations. • For example if our strata were cities in a country we would make sure and sample from each of the cities. If our strata were gender, we would sample both men and women.

  15. Cluster sampling • Cluster: a group of sampling units close to each other i.e. crowding together in the same area or neighborhood • In cluster sampling we take a random sample of strata and then survey every member of the group. • For example, if our strata were individuals schools in a city, we would randomly select a number of schools and then test all of the students within those schools.

  16. Cluster Samples of Households Section 1 Section 2 Section 3 Section 5 Section 4 Credit: Dr. MoatazaMahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

  17. More Complex Sampling Methods Multi-stage sampling Multi-phase sampling State Population County Sample:T1 Town Test 1 Households Sample:T2 Test 2 Person

  18. Number to sample? • “How many subjects should be studied?” • The sample size depends on the following factors: • I. Difference to be found • II. Variability of the measurement • III. Level of significance • IV. Power of the study Estimation of the sample size

  19. Difference to detect • “The magnitude of the difference to be detected” • A large sample size is needed to detection a small difference. • Thus, the sample size is inversely related to the precision of difference needed to detect.

  20. Variability of the measurement • The variability of measurements is reflected by the standard deviation or the variance. • The higher the standard deviation, the larger sample size is required. • Thus, sample size is directly related to the SD

  21. Level of significance • Relies on α error or type I error. The usual level of α has been arbitrarily set to 5% or 0.05. • Alpha error can be minimized to 0.01 or even 0.001 but this consequently increases the sample size. • Thus, sample size is inversely related to the level of α error. Alpha Error is considered before the study begins, but is only important when a significant difference or association is found.

  22. Power of the study • The power of the study is the probability that it will yield a statistically significant result. It is related to β or type II error. • Power is equal to (1- β), consequently the power of the study is increased by decreasing the beta error. • Thus, sample size is inversely related to the level of β error or directly related to the power of the study. Beta error is considered before the study begins, but is only of consequence when no difference of association is found (in hypothesis testing studies). Beta error is not a consideration in surveys that are only estimating parameters (descriptive studies). Estimations are only concerned with confidence (i.e. confidence level) in the estimate.

  23. Sample Size related to the Research Question, Design, and Analysis • The research question usually informs on: • variables to be considered and level of measurements to be used. • it also points to design type and analysis to be used. • Research type/design may address: • Exploration, description, estimation (Descriptive Studies) • Hypothesis testing of differences or relationships (Analytic Studies) • Modeling of variables for relationships or survival (Multivariable Studies) • Sample size must consider the type/design plus the measurement level of the variables. • Descriptive studies only ask how good is the estimate (and alpha error question) • Analytics studies must also consider Power (a Beta error question) • Additional variables (three or more) normally require larger sample sizes to maintain power in subgroups.

  24. Sample Size Determination: Calculations^ For Confidence in an Estimation: Example: Survey data (descriptive) For Hypothesis Testing: Example: analytic studies • Interval level variable • 1 sample: • 2 samples: • Nominal level variable • 1 sample: • 2 samples: Beta error not considered • Interval level variable • 1 sample: • Where: • 2 samples: • Where: • Nominal level variable • 1 sample: • Where: • 2 samples: • Where: ^SEE: Sullivan, Lisa M. (2008). Essentials of Biostatistics in Public Health. Jones and Barlett, Sudbury Ma.

  25. Four Research Questions • What is the blood sugar level in college students? • What proportion of male and female college students smoke? • Are smoking levels in college students different from the overall population? • Are blood sugar levels in college students different between males and females?

  26. Question 1: What is the blood sugar level in college students? Estimation/interval data/1 sample • If 95% confidence needed, z= 1.96 • Pilot survey estimate of standard deviation is 25 • And, E (margin of error) is not to exceed 5mg/dl • Then: n= 96

  27. Question 2: What proportion of male and female college students smoke? Estimation/nominal data/2 samples • If 95% confidence needed, z= 1.96 • Pilot survey estimate of male p = .25; female p= .2 • And, E (margin of error) is not to exceed 10% (.1) • Then: n= 95 (per group)

  28. Question 3: Are smoking levels in college students different from the overall population? Hypothesis testing/nominal data/1 sample • Set acceptable alpha at .05 (z1-a/2= 1.96); Power (1-B) at .8 (z1-B=.84) • Pilot survey estimate of college students p = .22; National average p= .3 • Then: n= 272

  29. Question 4: Are blood sugar levels in college students different between males and females? Hypothesis testing/interval data/2 samples • Set acceptable alpha at .05 (z1-a/2= 1.96); Power (1-B) at .8 (z1-B=.84) • Pilot survey estimate of females, mean = 95mg/dl, sd =10; males mean= 100mg/dl, sd = 10 • Then: n= 63 (per group)

  30. Other sources on Samples and Sample Size: • Many statistical programs have a sample size generators. Example: “statcalc” utility in EpiInfo http://www.cdc.gov/epiinfo/downloads.htm • Many web sites include sample size information: http://www.stat.uiowa.edu/~rlenth/Power/ • Additional lecture materials on sampling and sample size: http://www.pitt.edu/~super1/lecture/lec19041/index.htm http://www.pitt.edu/~super1/lecture/lec0542/index.htm

More Related