1 / 104

1.06k likes | 1.31k Views

Looking at Data. Clinical Data Example . 1. Kline et al. (2002) The researchers analyzed data from 934 emergency room patients with suspected pulmonary embolism (PE). Only about 1 in 5 actually had PE. The researchers wanted to know what clinical factors predicted PE.

Download Presentation
## Looking at Data

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Clinical Data Example**• 1. Kline et al. (2002) • The researchers analyzed data from 934 emergency room patients with suspected pulmonary embolism (PE). Only about 1 in 5 actually had PE. The researchers wanted to know what clinical factors predicted PE. • I will use four variables from their dataset today: • Pulmonary embolism (yes/no) • Age (years) • Shock index = heart rate/systolic BP • Shock index categories = take shock index and divide it into 10 groups (lowest to highest shock index)**Types of Variables: Overview**Categorical Quantitative binary nominal ordinal discrete continuous 2 categories + more categories + order matters + numerical + uninterrupted**Categorical Variables**• Also known as “qualitative.” • Categories. • treatment groups • exposure groups • disease status**Categorical Variables**• Dichotomous (binary)– two levels • Dead/alive • Treatment/placebo • Disease/no disease • Exposed/Unexposed • Heads/Tails • Pulmonary Embolism (yes/no) • Male/female**Categorical Variables**• Nominal variables– Named categories Order doesn’t matter! • The blood type of a patient (O, A, B, AB) • Marital status • Occupation**Categorical Variables**• Ordinal variable– Ordered categories. Order matters! • Staging in breast cancer as I, II, III, or IV • Birth order—1st, 2nd, 3rd, etc. • Letter grades (A, B, C, D, F) • Ratings on a scale from 1-5 • Ratings on: always; usually; many times; once in a while; almost never; never • Age in categories (10-20, 20-30, etc.) • Shock index categories (Kline et al.)**Quantitative Variables**• Numerical variables; may be arithmetically manipulated. • Counts • Time • Age • Height**Quantitative Variables**• Discrete Numbers– a limited set of distinct values, such as whole numbers. • Number of new AIDS cases in CA in a year (counts) • Years of school completed • The number of children in the family (cannot have a half a child!) • The number of deaths in a defined time period (cannot have a partial death!) • Roll of a die**Quantitative Variables**• Continuous Variables - Can take on any number within a defined range. • Time-to-event (survival time) • Age • Blood pressure • Serum insulin • Speed of a car • Income • Shock index (Kline et al.)**Looking at Data**• ü How are the data distributed? • Where is the center? • What is the range? • What’s the shape of the distribution (e.g., Gaussian, binomial, exponential, skewed)? • ü Are there “outliers”? • ü Are there data points that don’t make sense?**The first rule of statistics: USE COMMON SENSE!90% of the**information is contained in the graph.**Frequency Plots (univariate)**Categorical variables • Bar Chart Continuous variables • Box Plot • Histogram**Bar Chart**• Used for categorical variables to show frequency or proportion in each category. • Translate the data from frequency tables into a pictorial representation…**Bar Chart for SI categories**200.0 183.3 166.7 150.0 133.3 116.7 Number of Patients 100.0 83.3 66.7 50.0 33.3 16.7 0.0 1 2 3 4 5 6 7 8 9 10 Shock Index Category Much easier to extract information from a bar chart than from a table!**Box plot and histograms: for continuous variables**• To show the distribution (shape, center, range, variation) of continuous variables.**Box Plot: Shock Index**2.0 1.3 Q3 + 1.5IQR = .8+1.5(.25)=1.175 maximum (1.7) minimum (or Q1-1.5IQR) Outliers median (.66) Shock Index Units “whisker” 75th percentile (0.8) interquartile range (IQR) = .8-.55 = .25 0.7 25th percentile (0.55) 0.0 SI**Histogram of SI**25.0 16.7 Percent 8.3 0.0 0.0 0.7 1.3 2.0 SI Bins of size 0.1 Note the “right skew”**Box Plot: Shock Index**2.0 1.3 Shock Index Units 0.7 0.0 SI Also shows the “right skew”**maximum**minimum median 75th percentile interquartile range 25th percentile Box Plot: Age 100.0 More symmetric 66.7 Years 33.3 0.0 AGE Variables**14.0**9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Histogram: Age Not skewed, but not bell-shaped either…**Some histograms from your class (n=24)**Starting with politics…**Measures of central tendency**• Mean • Median • Mode**Central Tendency**• Mean– the average; the balancing point calculation: the sum of values divided by the sample size In math shorthand:**Mean: example**Some data: Age of participants: 17 19 21 22 23 23 23 38**14.0**Percent 9.3 4.7 0.0 0.0 33.3 66.7 100.0 Mean of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49 • 556.9546**14.0**9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 The balancing point Mean of age in Kline’s data**Mean of Pulmonary Embolism? (Binary variable?)**80.56% (750) 19.44% (181)**Mean**• The mean is affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 Mean = 4 • Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall**Central Tendency**• Median– the exact middle value Calculation: • If there are an odd number of observations, find the middle value • If there are an even number of observations, find the middle two values and average them.**Median: example**Some data: Age of participants: 17 19 21 22 23 23 23 38 Median = (22+23)/2 = 22.5**14.0**Percent 9.3 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Median of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49**14.0**50% of mass 50% of mass 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 Median of age in Kline’s data**Does PE have a median?**• Yes, if you line up the 0’s and 1’s, the middle number is 0.**Median**• The median is not affected by extreme values (outliers). 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3 • SSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall**Central Tendency**• Mode– the value that occurs most frequently**Mode: example**Some data: Age of participants: 17 19 21 22 23 23 23 38 Mode = 23 (occurs 3 times)**Mode of age in Kline’s data**• Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49**Mode of PE?**• 0 appears more than 1, so 0 is the mode.**Measures of Variation/Dispersion**• Range • Percentiles/quartiles • Interquartile range • Standard deviation/Variance

More Related