170 likes | 486 Views
Descriptive Statistics. Tabular and Graphical Displays Frequency Distribution - List of intervals of values for a variable, and the number of occurrences per interval Relative Frequency - Proportion (often reported as a percentage) of observations falling in the interval
E N D
Descriptive Statistics • Tabular and Graphical Displays • Frequency Distribution - List of intervals of values for a variable, and the number of occurrences per interval • Relative Frequency - Proportion (often reported as a percentage) of observations falling in the interval • Histogram/Bar Chart - Graphical representation of a Relative Frequency distribution • Stem and Leaf Plot - Horizontal tabular display of data, based on 2 digits (stem/leaf)
Comparing Groups • Side-by-side bar charts • 3 dimensional histograms • Back-to-back stem and leaf plots • Goal: Compare 2 (or more) groups wrt variable(s) being measured • Do measurements tend to differ among groups?
Sample & Population Distributions • Distributions of Samples and Populations- As samples get larger, the sample distribution gets smoother and looks more like the population distribution • U-shaped - Measurements tend to be large or small, fewer in middle range of values • Bell-shaped - Measurements tend to cluster around the middle with few extremes (symmetric) • Skewed Right - Few extreme large values • Skewed Left - Few extreme small values
Measures of Central Tendency • Mean - Sum of all measurements divided by the number of observations (even distribution of outcomes among cases). Can be highly influenced by extreme values. • Notation: Sample Measurements labeled Y1,...,Yn
Median, Percentiles, Mode • Median - Middle measurement after data have been ordered from smallest to largest. Appropriate for interval and ordinal scales • Pth percentile - Value where P% of measurements fall below and (100-P)% lie above. Lower quartile(25th), Median(50th), Upper quartile(75th) often reported • Mode - Most frequently occurring outcome. Typically reported for ordinal and nominal data.
Measures of Variation • Measures of how similar or different individual’s measurements are • Range -- Largest-Smallest observation • Deviation -- Difference between ith individual’s outcome and the sample mean: • Variance of n observations Y1,...,Yn is the “average” squared deviation:
Measures of Variation • Standard Deviation - Positive square root of the variance (measure in original units): • Properties of the standard deviation: • s 0, and only equals 0 if all observations are equal • s increases with the amount of variation around the mean • Division by n-1 (not n) is due to technical reasons (later) • s depends on the units of the data (e.g. $1000s vs $)
Empirical Rule • If the histogram of the data is approximately bell-shaped, then: • Approximately 68% of measurements lie within 1 standard deviation of the mean. • Approximately 95% of measurements lie within 2 standard deviations of the mean. • Virtually all of the measurements lie within 3 standard deviations of the mean.
Other Measures and Plots • Interquartile Range (IQR)-- 75th%ile - 25th%ile (measures the spread in the middle 50% of data) • Box Plots - Display a box containing middle 50% of measurements with line at median and lines extending from box. Breaks data into four quartiles • Outliers - Observations falling more than 1.5IQR above (below) upper (lower) quartile
Dependent and Independent Variables • Dependent variables are outcomes of interest to investigators. Also referred to as Responses or Endpoints • Independent variables are Factors that are often hypothesized to effect the outcomes (levels of dependent variables). Also referred to as Predictor or Explanatory Variables • Research ??? Does I.V. D.V.
Example - Clinical Trials of Cialis • Clinical trials conducted worldwide to study efficacy and safety of Cialis (Tadalafil) for ED • Patients randomized to Placebo, 10mg, and 20mg • Co-Primary outcomes: • Change from baseline in erectile dysfunction domain if the International Index of Erectile Dysfunction (Numeric) • Response to: “Were you able to insert your P… into your partner’s V…?” (Nominal: Yes/No) • Response to: “Did your erection last long enough for you to have succesful intercourse?” (Nominal: Yes/No) Source: Carson, et al. (2004).
Example - Clinical Trials of Cialis • Population: All adult males suffering from erectile dysfunction • Sample: 2102 men with mild-to-severe ED in 11 randomized clinical trials • Dependent Variable(s): Co-primary outcomes listed on previous slide • Independent Variable: Cialis Dose: (0, 10, 20 mg) • Research Questions: Does use of Cialis improve erectile function?
Sample Statistics/Population Parameters • Sample Mean and Standard Deviations are most commonly reported summaries of sample data. They are random variables since they will change from one sample to another. • Population Mean (m) and Standard Deviation (s) computed from a population of measurements are fixed (unknown in practice) values called parameters.