1 / 56

Population and sample

Population and sample. Chavanant Sumanasrethakul MD., M.Sc. Department of Preventive and Social Medicine BMA Medical College and Vajira Hospital 13 Oct 2009. จุดมุ่งหมายสำคัญของการศึกษา. นำผลที่ศึกษาไปใช้กับประชากรเป้าหมาย

delphina
Download Presentation

Population and sample

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Population and sample Chavanant Sumanasrethakul MD., M.Sc. Department of Preventive and Social Medicine BMA Medical College and Vajira Hospital 13 Oct 2009

  2. จุดมุ่งหมายสำคัญของการศึกษาจุดมุ่งหมายสำคัญของการศึกษา • นำผลที่ศึกษาไปใช้กับประชากรเป้าหมาย • ต้องมีกรอบของประชากรที่ถูกต้อง เพื่อได้ประชากรที่ศึกษาที่เป็นตัวแทนที่ดี • ต้องมีขนาดตัวอย่างที่เหมาะสม

  3. Statistical Inference POPULATION SAMPLE INFERENCE Process of making generalisation about a population from a sample.

  4. ประชากรและตัวอย่าง • ประชากร (population) คือ สมาชิกทั้งหมดที่เราสามารถนำผลการศึกษาไปประยุกต์ใช้ได้ ซึ่งอาจเป็นคน กลุ่มคน องค์กร หรือเหตุการณ์ • ประชากรเป้าหมาย (target population) คือประชากรที่เรากำหนดกรอบหรือกฎเกณฑ์ให้เจาะจงขึ้น เพื่อตรงกับความประสงค์ของการศึกษา ซึ่งกรอบนั้นขึ้นกับวัตถุประสงค์ คำถามการวิจัย หรือเกณฑ์ที่มีผู้กำหนดไว้แล้ว

  5. ประชากรและตัวอย่าง • ประชากรที่ใช้ศึกษา (study population) คือ ประชากรที่ผ่านเกณฑ์การคัดเลือกที่จะนำมาศึกษาโดยเป็นตัวแทนที่ดีสามารถครอบคลุมประชากรเป้าหมายได้เกือบทั้งหมด ซึ่งเกณฑ์ในการพิจารณา (eligibility criteria) มีทั้งการเลือกเข้าและคัดออกโดยพิจารณาจาก 1) เลือกสถานที่ 2) คุณลักษณะของประชากร เช่น เพศ อายุ ระยะของโรค ประชากรที่ติดตามได้ โรคอื่นๆที่เกิดร่วม ความยินยอม

  6. Population Research Question Target Population EligibilityCriteria Study Population Sampling Sample Allocation/ Randomization Control Group Experiment Group

  7. Point to consider Representativeness External validity Sampling bias Response bias Measurement error

  8. Example • Imagine that a random sample of 100 individuals is to be selected from a population • Their height in cm is measured • The mean height is computed • Another random sample of 100 individuals from the same population is taken • Their height in cm in measured • Their mean height is computed • This is repeated until 20 random samples have been taken

  9. 20 samples of size 100 The first sample of heights of 100 people gives a mean of 172.03 cm and a standard deviation (SD) of 6.03 cm. The second sample gives mean 173.50 cm & SD 6.74 cm. These figures represent the mean height (cm) for each of the 20 random samples 172.03 173.50 171.89 171.95 170.59 172.63 172.72 171.99 172.50 171.71 172.55 172.86 171.58 172.83 172.55 171.28 172.62 171.41 171.38 172.26

  10. Histogram of means of 20 samples

  11. Histogram of means of 100 samples

  12. Methods of sampling

  13. Simple random sampling • Each item in the population has an equal chance of being selected for the sample Random number table • 84 42 56 53 87 75 • 78 87 77 03 57 09 • 85 86 48 86 12 39 • 65 37 93 76 46 11 • 09 49 41 73 76 49 • 64 06 71 99 37 06 • 46 69 31 24 33 52 • 67 85 07 75 56 96

  14. Systematic sampling • Advantages - It is simpler and can be more representative than a simple random sample • Disadvantages - possibility of implicit clustering, not a simple random sample

  15. Stratified sampling • Increases the representativeness of the sample • Increases the precision of the resulting estimates • Allows comparison between strata

  16. 1 n1 children     n2 children 2   Schools sample (…) (…)      20 n20 children (population) (clusters) Cluster sampling • Disadvantages - Less representative than a simple random sample and there is a danger of contamination between respondents • Advantages - Cheaper and faster than a simple random sample

  17.  H areas (patients) Multi-stage sampling • Different sampling units are sampled at different stages Example Geographical areas of Thailand would randomly be selected, from which hospitals would be randomly selected from which wards/patients would then be randomly selected.

  18. Non-probability sampling • การเอกตัวอย่างแบบกำหนดจำนวนไว้ก่อน (quota sampling) • การเลือกตัวอย่างแบบสะดวก (convenience sampling) • การเลือกตัวอย่างตามมิติ (dimensional sampling) • การเลือกตัวอย่างแบบมีจุดมุ่งหมาย (purposive sampling) • การเลือกตัวอย่างแบบลูกโซ่ (snowball sampling)

  19. Probability sampling • Advantages • (easy to analyse) • should be unbiased • known precision • Disadvantages • requires sampling frame to be known • expense

  20. Non-probability sampling • Advantages • cheap • Quick • Disadvantages • may be biased • no estimate of precision

  21. Sample size

  22. Sample size • Why is it so important? • How do you calculate it? • What information do you need?

  23. Sample size (i) • Sample size important because : • too small a sample will fail to detect clinically important effects • too large a sample will identify statistically significant differences which have no clinical relevance • both situations unethical

  24. Factors which determine sample size calculation • The study design • (Survey), comparative, repeated measures • The outcome measure • categorical, continuous • The statistical analysis • t-test, chi-squared, regression

  25. Comparative studies • Null hypothesis: • Treatment A = Treatment B • Alternative hypothesis • Treatment A  Treatment B

  26. Significance level and power • Significance Level (Type I error rate) • This is the probability that the statistical test returns a significant result when there is no difference between treatments • Power (Type II error rate) • This gives the probability that a study of a given size would detect, as statistically significant, a real difference of a given magnitude

  27. Factors which affect sample size • The proportion with/average of the feature of interest in the control group • Minimum clinically significant difference • The smallest difference in effectiveness between two treatments that would be deemed to be clinically relevant • Significance level • Power • Measure of variability

  28. Example of categorical data • How many children with acute otitus media are needed in a clinical trial to test whether antibiotic treatment changes the proportion of children with normal eardrums • Statistical Test for this sort of data? • The null hypothesis in this example? • What information do we need to calculate the sample size?

  29. That’s the way to do it! • The proportion with the feature of interest in the control group i.e. • An estimate proportion with normal ear drums when not treated, 75% • The increase/decrease in the proportion of interest judged to clinically significant • In this case, 10% • Power and significance levels • 80% power and 5% significance

  30. Continuous outcome • How many patients are needed in a trial to evaluate the effect of supplementary vitamin D given to pregnant women in preventing neonatal hypocalcaemia? (Want to increase the mean serum calcium level!) • Statistical Test? • Null hypothesis?

  31. Continuous (continued) • Difference in mean serum calcium level that would be considered a clinical improvement • Increase of 0.5mg per 100ml • Estimate of the variation between infants in their serum calcium level • sd = 1.8 mg per 100ml (where do we get this info?) • Power and significance level • 80% and 5 % • Calculate standardised difference • standardised difference = difference/sd

  32. Limitations • Sample size calculation is not an exact science, could be considered informed guess work • Need for general consensus on what constitutes clinically important • Reporting of a sample size calculation does not infer that the study is adequately powered, study may not have recruited as many people as they anticipated, or suffered dropouts or loss to follow-up

  33. The truth is out there • An accurate inference is required to be both precise and unbiased • Steps should be taken to ensure that this is so • Minimise bias by • (adequate) randomisation • concealment and blinding • checking that missing data does not follow patterns • Maximise precision • formal statistical power calculation • CONSORT statement! (see references)

  34. Quantitative Method (การวัดเชิงปริมาณ) Qualitative Method (การวัดเชิงคุณภาพ)

  35. Alpha and beta errors

  36. Estimation single group

  37. Proportion response variable The one-sample problem Estimating the population proportion (P) Where d = Precision or difference (| p - P|) Z = 1.96 at 95% CI 1.64 at 90% CI P = the proportion in population or the true proportion n = Z2 P (1 - P) d2

  38. ตัวอย่าง ต้องการประมาณสัดส่วนของความดันโลหิตสูงในผู้สูงอายุ การศึกษาความชุกในอดีตเท่ากับร้อยละ 20 และความคลาดเคลื่อนที่ยอมรับได้เท่ากับร้อยละ 5 P = 0.2 1- P = 0.8 Za at 5% = 1.96 d = 0.05 or 5% ( Absolute diff.) n = 246

  39. ตัวอย่าง Comparing n in different P, d at a = .05 P dn 0.2 0.02 1537 0.03 683 0.04 384 0.05 246 0.30.02 2017 0.03 896 0.04 504 0.05 323

  40. Continuous response variables The one-sample problem Estimating the population mean n = Za22 d2 2 = variance of data d = precision or | X - m|

  41. Difference two independent groups

  42. Hypothesis testing Ho : P1 - P2 = 0 Ha : P1 - P2 = 0 n|group = {Za 2Pc(1-Pc) + Zb Pt(1-Pt) + Pc(1-Pc) }2 (Pt - Pc)2 where Pt = treatment, Pc = control Za at two-tailed (1.96)or one-tailed test (1.64) Z at one-tailed test Power = 1 – Note : this formula is derived for equal sized group

  43. Hypothesis testing for two population means Ho = m1 - m2 = 0 Ha = m1 - m2 = 0 n|group = 2 s2 [ Za+Zb ]2 [m1 - m2]2

  44. Difference two related groups

  45. Sample size for dependent samples 2 n pair = (Za + Zb) s • s = variance of the within pair diff. • = the value of the mean diff. • that is clinically important • to detect

  46. Association

  47. Unmatched Case-Control Studies Schlesselman (1982) 2 n|group = Z 2Po(1-Po) + Z P1(1-P1)+P2(1-P2) P1 - P2 P1 = P2 (OR) 1+P2 (OR-1) P2 = the proportion exposed in control OR = odds Ratio Po = (P1+P2) /2 where

  48. matched Case-Control Studies 2 N pair = Z 2 + Z 2P1P2/  P1 - P2 P1 = ma/N P2 = mb/N  = (P1 + P2) / 2 where

  49. Cohort Studies 2 n|group = Z 2P(1-P) + Z P1(1+R-P1(1+R2)) P1(1-R) P1 = Incidence of non exposure R = Relative risk P = (P1(1+R)) / 2 where

More Related