- 66 Views
- Uploaded on
- Presentation posted in: General

Descriptive Statistics: Part One

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Descriptive Statistics: Part One

Farrokh Alemi Ph.D.

Kashif Haqqi M.D.

Objectives

Definitions

Sampling methods

Types of variables

Reliability and validity

Average

Median

Mode

- Define validity and reliability and explain the role of each in assessing the quality of data.
- Distinguish among nominal, ordinal, and numeric data, as well as discrete and continuous data.
- Given a set of numerical data, calculate the mean, median and mode, and state the relative advantages of each as a measure of central tendency.

Back to Table of Content

- A variable is an attribute of a person or an object that varies.
- Measurement are rules for assigning numbers to objects to represent quantities of attributes.

Back to Table of Content

- Statistics is the science of describing or making inferences about the world from a sample of data.
- Descriptive statistics are numerical estimates that organize and sum up or present the data.
- Inferential statistics is the process of inferring from a sample to the population.

- Datum is one observation about the variable being measured.
- Data are a collection of observations.
- A population consists of all subjects about whom the study is being conducted.
- A sample is a sub-group of population being examined.

- Random sample: all subjects have equal chance of inclusion in the study.
- Systematic sampling: selecting the kth numbered subject.
- Stratified sample: random sampling within pre-defined groups of subjects.
- Staged sampling: A small random sample is made and if its results are ambiguous then another larger random sample is collected.

Back to Table of Content

- A discrete variable has gaps between its values. For example, sex is a discrete variable. If male is 1 and female is 0, values in between have no meaning.
- A continuous variable has no gaps between its values. All values or fractions of values have meaning. Age is an example of continuous variable.

Back to Table of Content

- Nominal scale assign numbers to attribute to name the category. The numbers have no meaning by themselves, e.g. DRG code.
- Ordinal scale assign numbers so that more of an attribute has higher values, e.g. Severity.
- In an interval scale the interval between the numbers has meaning, e.g. Fahrenheit scale
- Ratio scale is an interval scale where zero has true meaning, e.g. Age.

Back to Table of Content

To Be Valid You Must Have a Reliable Measure. But You Can Have an Invalid Measure That Is Reliable.

- Next page shows a table from Hayward, RA, McMahon LF, Bernard AM. Evaluating the care of general medicine inpatients: how good is implicit review? Annals of Internal Medicine, volume 118(7), 1993, pp 550-556.
- Two reviewers rated the quality of health care delivered in the same case. The Table shows inter-rater reliability.
- 00000605-199304010-00010.

- The mean, arithmetic average, is found by adding values of the data and dividing by the number of values. The mean of 3, and 4 is 3.5.
- The geometric average is found by multiplying the values of the data and taking the power of one divided by the number of values. The geometric average of 3 and 4 is square root of 3 times 4.
- Can you calculate the mean and geometric average for 3, 4, and 5?

Back to Table of Content

- The mean of 3, 4 and 5 is the sum of these numbers divided by 3.
- The geometric average of 3, 4 and 5 is the cube root of 3 times 4 times 5. To calculate the cube root in excel you write a formula like: =(3*4*5)^0.33
- The answer is 3.86. Open Excel and verify that you can do this.

- A geometric average is used when averaging probabilities.
- A mean is used in most other context.

- The median is the halfway point in a data set.
- To calculate median arrange data in order. Calculate half of the observations by dividing the number of values by 2 and rounding the value to the lower number. Count half the values and use the next value as median.

Back to Table of Content

- The median for age of 7 patients (23, 45, 56, 23, 34, 65, 25) if given by:
- Order the list of values: 23, 23, 25, 34, 45, 56, 65.
- There are 7 observations. Divide 7 by two and round to lower number and you get 3.
- Skip the first 3 and the median is the next number. In this example, 34 is the median.
- Do this in Excel.

- The most frequent value observed is the mode.
- Mode is always an observed value in the data set.
- To calculate the mode, count the number of times each value is repeated. The value with most repetition is the mode.
- Do this in Excel.

Back to Table of Content

- Age data: 23, 45, 56, 23, 34, 65, 25.
- 23 is repeated twice.
- All other values are repeated once.
- The mode is 23.

- Mode, median and mean could be three different numbers in asymmetrical distributions of data.
- For any data set there is only one mean and median but there may be many modes.
- Median is less influenced by the extreme values than mean.
- Mean is almost never observed, median is observed in only odd numbered data sets and mode is always observed in the data set.