# Introduction to Categorical Descriptive Statistics - PowerPoint PPT Presentation

1 / 43

Introduction to Categorical Descriptive Statistics. Overview. Contingency tables Notation Descriptive statistics Difference in proportions Relative risk Odds ratio SPSS. Contingency Tables. Two dimensional tables Let X and Y be categorical variables X has I levels and Y has J levels

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

### Download Presentation

Introduction to Categorical Descriptive Statistics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Introduction to Categorical Descriptive Statistics

### Overview

• Contingency tables

• Notation

• Descriptive statistics

• Difference in proportions

• Relative risk

• Odds ratio

• SPSS

### Contingency Tables

• Two dimensional tables

• Let X and Y be categorical variables

• X has I levels and Y has J levels

• A contingency or cross-classification table is a tabular representation of the frequency counts for each pair of variable levels

• Cell Counts

### Cell Proportions

• Often not as interested in absolute counts within cells as opposed to the relationship between the cell proportions

• To properly analyze cell proportions need to know experimental design and relationship between the variables

• All variables can be considered response variables

• One (or more) response variable and one (or more explanatory variable

• Prospective study

• Retrospective study

### Two Variables

• Proportion notation

• {ij} gives the joint distribution

• {i+} and {+j} represent the marginals

• {j|i} is the conditional distribution of Y given level i of X

### Example

• 1982 General Social Survey report on attitudes about death penalty and gun registration

• Calculate joint, marginal and conditional distributions

### Prospective Study

• Subjects either select or are selected for treatment groups and then response is studied

• Experimental

• Subjects are randomly allocated to treatment groups

• Observational

• Subjects self-select treatment group

• Principal aim is to compare conditional distribution of response for different levels of explanatory variable(s)

### Example

• Findings from the Aspirin Component of the Ongoing Physicians’ Health Study

• Calculate conditional distribution

### Retrospective Study

• Given response, look back at levels of possible explanatory variables

• Observational studies

• Typically “over-sample” for response level of interest

• If know overall population proportion in each response level could use Bayes theorem to calculate conditional distribution in direction of interest

### Example

• England-Wales 1968-1972 study on heart attacks and oral contraceptive use

• Calculate appropriate conditional distribution

### Descriptive Statistics

• Comparing proportions for binary responses

• Difference of proportions

• Relative risk

• Odds ratios

• Independence

• X and Y response: pij = pi+p+j, for all i,j

• That is, pj|i = p+j, for all i,j

• X explanatory, Y response: pj|i = pj|h, for each j, for all i,h

### Descriptive Statistics

• I x J tables

• No completely satisfactory way to summarize association

• Pairs of odds ratios

• Concentration coefficient

• Uncertainty coefficient

### Difference of Proportions

• Binary response variable

• Generally, compare response for different explanatory levels

• p1|i - p1|h

• Difference lies between -1 and 1

• Independence when difference equals 0 for all i,h and response levels j

• Reasonable measure when absolute difference in proportions is relevant

• Also can compare differences between columns

• Example

• Example

• Example

### Relative Risk

• Used when relative difference between proportions more relevant than absolute difference

• p1|1 /p1|2

• Relative risk of 1 corresponds to independence

• Comparison on second response different

• Usually can not be directly calculated from retrospective studies

### Example

Risk for women having first child at 25 or older = .019 or 1.9%

Risk for women having first child before 25 = .0143 or 1.43%

Relative risk = .019/.0143 = 1.33

Increased risk = 33%

• Example

### Odds Ratio

• For 2x2 table,

• In row 1, odds of being column 1 instead of column 2: O1 = p1|1 /p2|1

• In row 2, odds of being column 1 instead of column 2: O2 = p1|2 /p2|2

• Odds ratio: O1/O2

### Odds Ratio

• Takes values > 0

• Sometimes look at log odds ratio

• Invariant to interchanging rows and columns

• Unnecessary to specify response variable

• Unlike difference of proportions, and relative risk

### Odds Ratio

• Multiplicative invariance within given row or column

• Like difference of proportions and relative risk

• Equally valid for retrospective, prospective and cross-sectional studies

### Example

Odds for women having first child at 25 or older = 31/1597 = .019/.981 = .0194

Odds for women having first child before 25 = 65/4475 = .0143/.9857 = .0145

Odds ratio = .0194/.0145 = 1.34

### Relationship Between Odds Ratio and Relative Risk

• Odds ratio = Relative risk (1-p1|2)/(1-p1|1)

• When probability of outcome of interest is small, regardless of row condition, then can use odds ratio as an estimate of relative risk

### Interpreting Risks and Odds

• Assess baseline risk

• Example: Men who drink 16 ounces of beer a day are three times more likely to develop rectal cancer

• Know time period of risk

• Risks accumulate with time

• Example: 1 in 9 women will develop breast cancer over their lifetime. But annual risk of women in their 30’s is 1 in 3700 and women in their 70’s is 1 in 235

• Investigate confounding factors

• Example: Older cars are almost 6 times as likely to be stolen than newer cars

### Simpson’s Paradox

• Survival rates for a standard and a new treatment at two hospitals

### Relative Risk

• Hospital A:

• Risk of dying with standard treatment = 95/100 = .95

• Risk of dying with new treatment = 900/1000 = .90

• Relative risk = .95/.90 = 1.06

### Relative Risk

• Hospital B:

• Risk of dying with standard treatment = 500/1000 = .5

• Risk of dying with new treatment = 5/100 = .05

• Relative risk = .5/.05 = 10.0

### Combined Data

• Group data from both hospitals

• Risk of dying with standard treatment = 595/1100 = .54

• Risk of dying with new treatment = 905/1100 = .82

• Relative risk = .54/.82 = .66

### What is Going On?

• When data is combined, lose the information that the patients in Hospital A had BOTH a higher overall death rate AND a higher likelihood of receiving the new treatment

• Misleading to summarize information over groups, especially if subjects were not randomly assigned to groups

### Confounding Variables

• Television ownership versus movie attendance

### Confounding Variables

• Control for income

### More Examples

• Discrimination in college admission

• Racial bias in death penalty sentences

### College Admission Bias

• Over a given number of years the University of California, Berkeley admitted 44% of all men who applied to any one of six graduate programs and only 30% of women who applied

• Is there evidence of discrimination in graduate admissions at Berkeley?

### Death Penalty Sentences

• Results of 1981 Florida study of whether race of homicide defendant affect likelihood that death penalty would receive death penalty

### Question

• Based on data, is race a factor in whether the death penalty is received and if so how is race a factor?