Loading in 5 sec....

Introduction to Categorical Descriptive StatisticsPowerPoint Presentation

Introduction to Categorical Descriptive Statistics

- 97 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Introduction to Categorical Descriptive Statistics' - tanner

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Overview

- Contingency tables
- Notation

- Descriptive statistics
- Difference in proportions
- Relative risk
- Odds ratio

- SPSS

Contingency Tables

- Two dimensional tables
- Let X and Y be categorical variables
- X has I levels and Y has J levels

- A contingency or cross-classification table is a tabular representation of the frequency counts for each pair of variable levels

- Let X and Y be categorical variables

Notation

- Cell Counts

Cell Proportions

- Often not as interested in absolute counts within cells as opposed to the relationship between the cell proportions
- To properly analyze cell proportions need to know experimental design and relationship between the variables
- All variables can be considered response variables
- One (or more) response variable and one (or more explanatory variable
- Prospective study
- Retrospective study

Two Variables

- Proportion notation
- {ij} gives the joint distribution
- {i+} and {+j} represent the marginals
- {j|i} is the conditional distribution of Y given level i of X

Example

- 1982 General Social Survey report on attitudes about death penalty and gun registration
- Calculate joint, marginal and conditional distributions

Prospective Study

- Subjects either select or are selected for treatment groups and then response is studied
- Experimental
- Subjects are randomly allocated to treatment groups

- Observational
- Subjects self-select treatment group

- Principal aim is to compare conditional distribution of response for different levels of explanatory variable(s)

- Experimental

Example

- Findings from the Aspirin Component of the Ongoing Physicians’ Health Study
- Calculate conditional distribution

Retrospective Study

- Given response, look back at levels of possible explanatory variables
- Observational studies

- Typically “over-sample” for response level of interest
- If know overall population proportion in each response level could use Bayes theorem to calculate conditional distribution in direction of interest

Example

- England-Wales 1968-1972 study on heart attacks and oral contraceptive use
- Calculate appropriate conditional distribution

Descriptive Statistics

- Comparing proportions for binary responses
- Difference of proportions
- Relative risk
- Odds ratios

- Independence
- X and Y response: pij = pi+p+j, for all i,j
- That is, pj|i = p+j, for all i,j

- X explanatory, Y response: pj|i = pj|h, for each j, for all i,h

- X and Y response: pij = pi+p+j, for all i,j

Descriptive Statistics

- I x J tables
- No completely satisfactory way to summarize association
- Pairs of odds ratios
- Concentration coefficient
- Uncertainty coefficient

Difference of Proportions

- Binary response variable
- Generally, compare response for different explanatory levels
- p1|i - p1|h

- Difference lies between -1 and 1
- Independence when difference equals 0 for all i,h and response levels j
- Reasonable measure when absolute difference in proportions is relevant
- Also can compare differences between columns

- Generally, compare response for different explanatory levels

Difference of Proportions

- Example

Difference of Proportions

- Example

Difference of Proportions

- Example

Relative Risk

- Used when relative difference between proportions more relevant than absolute difference
- p1|1 /p1|2
- Relative risk of 1 corresponds to independence
- Comparison on second response different

- Usually can not be directly calculated from retrospective studies

Example

Risk for women having first child at 25 or older = .019 or 1.9%

Risk for women having first child before 25 = .0143 or 1.43%

Relative risk = .019/.0143 = 1.33

Increased risk = 33%

Relative Risk

- Example

Odds Ratio

- For 2x2 table,
- In row 1, odds of being column 1 instead of column 2: O1 = p1|1 /p2|1
- In row 2, odds of being column 1 instead of column 2: O2 = p1|2 /p2|2
- Odds ratio: O1/O2

Odds Ratio

- Takes values > 0
- Sometimes look at log odds ratio

- Invariant to interchanging rows and columns
- Unnecessary to specify response variable
- Unlike difference of proportions, and relative risk

- Unnecessary to specify response variable

Odds Ratio

- Multiplicative invariance within given row or column
- Like difference of proportions and relative risk

- Equally valid for retrospective, prospective and cross-sectional studies

Example

Odds for women having first child at 25 or older = 31/1597 = .019/.981 = .0194

Odds for women having first child before 25 = 65/4475 = .0143/.9857 = .0145

Odds ratio = .0194/.0145 = 1.34

Relationship Between Odds Ratio and Relative Risk

- Odds ratio = Relative risk (1-p1|2)/(1-p1|1)
- When probability of outcome of interest is small, regardless of row condition, then can use odds ratio as an estimate of relative risk

Interpreting Risks and Odds

- Assess baseline risk
- Example: Men who drink 16 ounces of beer a day are three times more likely to develop rectal cancer

- Know time period of risk
- Risks accumulate with time
- Example: 1 in 9 women will develop breast cancer over their lifetime. But annual risk of women in their 30’s is 1 in 3700 and women in their 70’s is 1 in 235

- Risks accumulate with time
- Investigate confounding factors
- Example: Older cars are almost 6 times as likely to be stolen than newer cars

Simpson’s Paradox

- Survival rates for a standard and a new treatment at two hospitals

Relative Risk

- Hospital A:
- Risk of dying with standard treatment = 95/100 = .95
- Risk of dying with new treatment = 900/1000 = .90
- Relative risk = .95/.90 = 1.06

Relative Risk

- Hospital B:
- Risk of dying with standard treatment = 500/1000 = .5
- Risk of dying with new treatment = 5/100 = .05
- Relative risk = .5/.05 = 10.0

Combined Data

- Group data from both hospitals
- Risk of dying with standard treatment = 595/1100 = .54
- Risk of dying with new treatment = 905/1100 = .82
- Relative risk = .54/.82 = .66

What is Going On?

- When data is combined, lose the information that the patients in Hospital A had BOTH a higher overall death rate AND a higher likelihood of receiving the new treatment
- Misleading to summarize information over groups, especially if subjects were not randomly assigned to groups

Confounding Variables

- Television ownership versus movie attendance

Confounding Variables

- Control for income

More Examples

- Discrimination in college admission
- Racial bias in death penalty sentences

College Admission Bias

- Over a given number of years the University of California, Berkeley admitted 44% of all men who applied to any one of six graduate programs and only 30% of women who applied
- Is there evidence of discrimination in graduate admissions at Berkeley?

Death Penalty Sentences

- Results of 1981 Florida study of whether race of homicide defendant affect likelihood that death penalty would receive death penalty

Question

- Based on data, is race a factor in whether the death penalty is received and if so how is race a factor?

Download Presentation

Connecting to Server..