- 214 Views
- Uploaded on
- Presentation posted in: General

Introduction to Biostatistics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

**1. **Introduction to Biostatistics Ronald D. Warner, DVM, PhD
Assoc Professor, Clinical Preventive Medicine
Dept Family & Community Medicine
TTUHSC; Lubbock, TX

**3. **Human rabies cases - US, 1990-2000; listed by: age gend. exp/loc source death

**4. **Summarizing and describing a data set: tabulations, plots, ‘centers’, & ‘spreads’ How do we reduce a large data set to a ‘manageable volume’ ?
Graphically: histograms/polygons, bar charts, box & whisker plots, scatter plots, etc.
summary measures of central tendency: means, medians, modes
summary measures of dispersion: std deviations; range, quartiles; five-number summary
illustrations; using Human Rabies cases - US; 1990-2000

**5. **Descriptive epidemiology : Patterns of Disease Occurrence distribution of disease in populations numerator ( “event” count ) / denominator ( group “at risk” )
by “person” : age , race / ethnicity , gender , occupation , education , marital status , genetic marker , sexual preference
by “place” : residence (urban vs. rural) , worksite , social event
by “time” : week , month , year ; sporadic , seasonal , trends --- incubation period ; latency

**6. **Pattern of “All - cause” Mortality ; by “person” : Age groupings

**7. **Pattern of disease Occurrence ; by “place” Rocky Mountain Spotted Fever

**8. **Pattern of disease Occurrence : by “Place”

**9. **Pattern of disease Occurrence : by “Time”

**10. **Patterns of disease Occurrence : Correlation of Population statistics Ecologic ( correlation ) studies --- plot : disease (population) burden [ Y axis ] vs. prevalence of “risk factor” [ X axis ]
-- correlation coefficient : r ; + or - -- r-squared : % variability in Y “explained” by X
is only a hypothesis-generating study design * beware of ecologic fallacy when considering “results”

**11. **Descriptive epidemiology : pattern of occurrence Prevalence of HIV+ and community Mosquito index

**12. **Relationship between # dental carries & fluoride content of public water adapted from - Dean HT , et al. 1942. Pub Hlth Rep 57:1155-79

**13. **STEM & LEAF PLOTS ( ages of human rabies cases ) First, data should be plotted or tabulated: e.g.; ages: 4, 11, 14, 22, 27, 29, 42, 44, 55, 69, 74, 82, etc.
STEM LEAF 0 4 1 1, 1, 3, 4 2 2, 4, 6, 6, 7, 8, 9, 9 3 0, 2 4 0, 1, 2, 2, 4, 7, 9, 9 5 4, 5 6 4, 5, 9, 9 7 1, 4 8 2

**14. **A distribution or population [of values] ( every group of ‘related’ values has a distribution )

**15. **Measures of location (central tendency); (“mid”-points) : mean, median, mode Ages of rabies cases: 4, 11, 11, 13, 14, 22, 24, 26, 26, 27, 28, 29, 29, 30, 32, 40, 41, 42, 42, 44, 47, 49, 49, 54, 55, 64, 65, 69, 69, 71, 74, and 82 ( distribution of a continuous [discrete] variable )
what is the best measure of central tendency for these data ?
The mean (arithmetic avg): sum of values / # (n) of observations 1283 / 32 = 40.09375 remember: the mean is very sensitive to ‘extreme’ values.
the median: the ‘middle most’ value: ... 32, 40 41, 42 …; in this even # of observations, 40 + 41 / 2 = 40.50 remember: the median is robust re: ‘extreme’ values
mode ?; the most common value: 11, 26, 29, 42, 49, or 69 ?? remember: a mode is best used for Categorical variables

**16. **Measures of Dispersion (internal variation): std. deviation, range, quartiles, box & whisker plots Two different distributions may have identical measures of central tendency, but very different dispersions; and, vice-versa.
for means, the measure of dispersion is a std. deviation, “the avg. ‘distance’ of any observation from the mean. For ages of the rabies cases, the std. deviation (using formula in texts or calculator) = + 20.864168.
for medians, the measure of dispersion is the inter-quartile range. For ages of the rabies cases, the range = 04 - 82; the 25th percentile (1st quartile) = 26, the 50th percentile (median) = 40.5, an the 75th percentile (3r quartile) = 54.5. The ‘five # summary’: 04, 26, 40.5, 54.5, 82; box & whisker plot:

**17. **a ‘normal’ or Gaussian distribution ( basis of statistical inference for many populations )

**18. **Are the frequencies of ‘rabies’ ages distributed “normally” ? 68.3% of data w/i -1 s.d. to +1 s.d. of the mean ? Not exactly. The {-1s.d.; +1s.d.} interval [19.23 yrs - 60.95 yrs] contains only twenty (62.5%) of the thirty-two values.
95.5% of data w/i -2 s.d. to +2 s.d. of the mean ? Not exactly. The {-2s.d.; +2s.d.} interval [unborn - 81.81 yrs] contains thirty-one (96.9%) of the thirty-two values.
Based on median vs mean, this distribution is slightly skewed. However, we assume that the population from which these data arise has an approximately ‘normal’ distribution; so, we can ‘safely’ use the mean.

**19. **Std deviation vs std error of the mean ( when do you use one, but not the other ? ) The std. deviation is used when describing: quantifying the variation around the mean of a sample. Std deviation is an important statistic when determining if two samples likely originated from the same underlying population.
Central limit theorem; “sample means are normally distributed”
The std error of the mean is used when estimating the mean of the underlying population (from which the sample originated). Std error is the important statistic for use in calculating the confidence of your sample statistic (sample mean), and it is determined by both std deviation of the sample & sample size, which may not be independent of each other … in nearly all cases, when sample size increases, the std deviation decreases

**20. **Degrees of freedom, sample size, p-value, critical point(s), and stat “decision” ( t & z) tables Degrees of freedom: n - 1 ?; getting to “total” of study.
generally, sample sizes < 30 are considered ‘unstable’ and ‘special’ statistical tests are required. Ref to readings:
the amount of random error you are willing to accept
point(s) in the frequency distribution that correspond to std deviations relating to the hypothesis being tested
tables that have been generated to “integrate” probability over a continuous frequency distribution.

**21. **Human rabies deaths - US, US exposures only, 1991-2000; listed by: age gend. exp/loc source death

**22. ** frequency distribution of ‘91 - ‘00 rabies deaths;by gender AND ‘91 - ‘95 rabies deaths; by gender and source of exposure

**23. **If we wish to compare difference (?) of Staph aureus in swimmers vs non-swimmers … display & analysis:

**24. **Data: This is analyzed by Chi-square statistic.

**25. ** Compare difference of Staph aureus carriage, by swimming exposure.

**26. **the general Epidemiologic ( scientific ) Approach 1. Identify a PROBLEM : clinical suspicion ; case series ; review of medical literature
2. Formulate a HYPOTHESIS ( asking the right question ) ; good hypotheses are: Specific, Measurable, and Plausible
3. TEST that hypothesis ( assumptions vs. type of data )
4. always Question the VALIDITY of the result(s) : Chance ; Bias ; and Causality

**27. **the epidemiologic study: threats to Validity Chance : role of random error in outcome measure(s) ( p - value ; power of the study and the confidence interval ) --- largely determined by sample size Bias : role of systematic error in outcome measure(s)
Selection bias - subjects not representative
Information bias - error(s) in subject data / classification
Confounding - 3rd variable (causal) assoc. w/ both X and Y