Download Presentation
## Looking at Data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Clinical Data Example**• 1. Kline et al. (2002) • The researchers analyzed data from 934 emergency room patients with suspected pulmonary embolism (PE). Only about 1 in 5 actually had PE. The researchers wanted to know what clinical factors predicted PE. • I will use four variables from their dataset today: • Pulmonary embolism (yes/no) • Age (years) • Shock index = heart rate/systolic BP • Shock index categories = take shock index and divide it into 10 groups (lowest to highest shock index)**Types of Variables: Overview**Categorical Quantitative binary nominal ordinal discrete continuous 2 categories + more categories + order matters + numerical + uninterrupted**Categorical Variables**• Also known as “qualitative.” • Categories. • treatment groups • exposure groups • disease status**Categorical Variables**• Dichotomous (binary)– two levels • Dead/alive • Treatment/placebo • Disease/no disease • Exposed/Unexposed • Heads/Tails • Pulmonary Embolism (yes/no) • Male/female**Categorical Variables**• Nominal variables– Named categories Order doesn’t matter! • The blood type of a patient (O, A, B, AB) • Marital status • Occupation**Categorical Variables**• Ordinal variable– Ordered categories. Order matters! • Staging in breast cancer as I, II, III, or IV • Birth order—1st, 2nd, 3rd, etc. • Letter grades (A, B, C, D, F) • Ratings on a scale from 1-5 • Ratings on: always; usually; many times; once in a while; almost never; never • Age in categories (10-20, 20-30, etc.) • Shock index categories (Kline et al.)**Quantitative Variables**• Numerical variables; may be arithmetically manipulated. • Counts • Time • Age • Height**Quantitative Variables**• Discrete Numbers– a limited set of distinct values, such as whole numbers. • Number of new AIDS cases in CA in a year (counts) • Years of school completed • The number of children in the family (cannot have a half a child!) • The number of deaths in a defined time period (cannot have a partial death!) • Roll of a die**Quantitative Variables**• Continuous Variables - Can take on any number within a defined range. • Time-to-event (survival time) • Age • Blood pressure • Serum insulin • Speed of a car • Income • Shock index (Kline et al.)**Looking at Data**• ü How are the data distributed? • Where is the center? • What is the range? • What’s the shape of the distribution (e.g., Gaussian, binomial, exponential, skewed)? • ü Are there “outliers”? • ü Are there data points that don’t make sense?**The first rule of statistics: USE COMMON SENSE!90% of the**information is contained in the graph.**Frequency Plots (univariate)**Categorical variables • Bar Chart Continuous variables • Box Plot • Histogram**Bar Chart**• Used for categorical variables to show frequency or proportion in each category. • Translate the data from frequency tables into a pictorial representation…**Bar Chart for SI categories**200.0 183.3 166.7 150.0 133.3 116.7 Number of Patients 100.0 83.3 66.7 50.0 33.3 16.7 0.0 1 2 3 4 5 6 7 8 9 10 Shock Index Category Much easier to extract information from a bar chart than from a table!**Box plot and histograms: for continuous variables**• To show the distribution (shape, center, range, variation) of continuous variables.**Box Plot: Shock Index**2.0 1.3 Q3 + 1.5IQR = .8+1.5(.25)=1.175 maximum (1.7) minimum (or Q1-1.5IQR) Outliers median (.66) Shock Index Units “whisker” 75th percentile (0.8) interquartile range (IQR) = .8-.55 = .25 0.7 25th percentile (0.55) 0.0 SI**Histogram of SI**25.0 16.7 Percent 8.3 0.0 0.0 0.7 1.3 2.0 SI Bins of size 0.1 Note the “right skew”**Box Plot: Shock Index**2.0 1.3 Shock Index Units 0.7 0.0 SI Also shows the “right skew”**maximum**minimum median 75th percentile interquartile range 25th percentile Box Plot: Age 100.0 More symmetric 66.7 Years 33.3 0.0 AGE Variables**14.0**9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Histogram: Age Not skewed, but not bell-shaped either…**Some histograms from your class (n=24)**Starting with politics…**Measures of central tendency**• Mean • Median • Mode**Central Tendency**• Mean– the average; the balancing point calculation: the sum of values divided by the sample size In math shorthand:**Mean: example**Some data: Age of participants: 17 19 21 22 23 23 23 38**14.0**Percent 9.3 4.7 0.0 0.0 33.3 66.7 100.0 Mean of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49 • 556.9546**14.0**9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 The balancing point Mean of age in Kline’s data**Mean of Pulmonary Embolism? (Binary variable?)**80.56% (750) 19.44% (181)**Mean**• The mean is affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 Mean = 4 • Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall**Central Tendency**• Median– the exact middle value Calculation: • If there are an odd number of observations, find the middle value • If there are an even number of observations, find the middle two values and average them.**Median: example**Some data: Age of participants: 17 19 21 22 23 23 23 38 Median = (22+23)/2 = 22.5**14.0**Percent 9.3 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Median of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49**14.0**50% of mass 50% of mass 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 Median of age in Kline’s data**Does PE have a median?**• Yes, if you line up the 0’s and 1’s, the middle number is 0.**Median**• The median is not affected by extreme values (outliers). 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3 • SSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall**Central Tendency**• Mode– the value that occurs most frequently**Mode: example**Some data: Age of participants: 17 19 21 22 23 23 23 38 Mode = 23 (occurs 3 times)**Mode of age in Kline’s data**• Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49**Mode of PE?**• 0 appears more than 1, so 0 is the mode.**Measures of Variation/Dispersion**• Range • Percentiles/quartiles • Interquartile range • Standard deviation/Variance